Heterogeneous migratable kernels

As I understand it, when multiple processes want to use the OpenCL devices on a system, access is mutually exclusive and first-come-first-serve, and a device will be idled if the host process is being pre-empted by the task scheduler. This is probably far from optimal, and will probably happen regularly if OpenCL ever becomes popular enough.

For my master’s thesis, I’m planning to develop a system called Heterogeneous Processes on Heterogeneous Processing Units (HPHPU), which will make better multitasking for OpenCL programs possible. It will act as an added layer between OpenCL and the applications; it will ensure that all applications use a context that includes all devices on the system, and dynamically assign kernels to devices, based on both load balancing and performance considerations. It’ll also be able to set CPU affinities so that devices aren’t unnecessarily blocked waiting for a host process.

However, I’m concerned about backward compatibility. If an OpenCL app assumes its device choice has been respected, this may lead to the wrong device-specific optimizations, and to the use of non-existent features that then have to be implemented in software, and to clCreateProgramFromBinary() calls that have to go through a decompiler. How common and problematic are these issues likely to be in practice?

We also need a way to tell the application which device(s) we’ve chosen for a particular kernel, and alert it if we migrate the kernel. It would also be useful if the application could supply hints saying, say, “This kernel’s performance will be about 1500 work items per second on device A, 2200 on device B, or 1800 on device C.” How much of an extension would these features be?

As I understand it, when multiple processes want to use the OpenCL devices on a system, access is mutually exclusive and first-come-first-serve, and a device will be idled if the host process is being pre-empted by the task scheduler.

Friendly advice: if you are going to write a master thesis on a subject, it is a good idea to make sure that your assumptions are correct before you start.

How common and problematic are these issues likely to be in practice?

You can simply list them as limitations of your work in your master’s thesis. In my experience, theses are not expected to deal with all the intricacies of production software.

“This kernel’s performance will be about 1500 work items per second on device A, 2200 on device B, or 1800 on device C.”

Notice that if that information was available to the application, it could do some forms of load balancing by itself.