both AMD and NVIDIA use Clang and LLVM to compile the OpenCL to PTX and AMD IL.
Interesting, never knew that. For ease of use and fast conversion from C/C++ code to PTX, would you recommend Clang or LLVM? Also, maybe I wouldn’t even need to modify the Clang or LLVM code, as conversion from C to PTX seems to be what I need here, and they seem to do that just fine?
I would just try it to see if it works.
I’ll need to study the driver API as my experience with either CUDA or OpenCL is minimal. I’m only just beginning to figure out the basics of the various compiler chains.
What I might try first is an all-OpenCL approach to see if I can get it up and running. After that, I might try to find some way for OpenCL and CUDA to cooperate.
[quote:3goky2pj]1: To make my task easier, can OpenCL directly generate CUBIN files?
Not in the current CUDA SDK or drivers. You can only get PTX.[/quote:3goky2pj]
FYI, when I try to use nvcc to compile from .cu to .ptx, it also requires use of the Microsoft Visual C compiler - cl.exe (even though in theory, it shouldn’t need cl.exe at all). Unfortunately, Microsoft won’t allow me to redistribute their cl compiler, (and I haven’t heard back yet from NVidia about redistributing nvcc either).
I don’t think you can redistribute the ciompiler. The compiler seems to take a few seconds to compile even small code currently. Once you get the PTX and cache it the compile time is a lot quicker.
Just to be clear, when you say compiler here, I presume you mean NVidia’s version of the OpenCL compiler.
(sub-issue: I initially thought there was only “one OpenCL”, because it is able to run on any platform, but NVidia, AMD, Intel etc. seem to be offering their own flavour - a real shame they can’t join together and make available one single download for all).
You mention it takes a few seconds to compile even small code (I presume you mean from source to PTX). That sounds like a potential issue, because I’d like speeds faster than at least half a second to compile. If I use Clang or LLVM to convert from source to PTX, will that be any quicker?
[quote:3goky2pj]3: If I were to switch to OpenCL entirely, can I use it to create DLL files and for the main code to use an arbitrary function from within the DLL to execute (pointer to function needed I think) ?
Can you explain this better? All you get back from OpenCL is PTX which you could include as a resource section in the DLL. It is still better to include the OpenCL code and just compile it the first time your program runs after a driver version change so that you get the benefit of any bug fixes or performance enhancements included in future compilers.[/quote:3goky2pj]
Sure and thanks for asking…
Currently on the CPU, I have my graphics program contain a section for the user to input ‘scripting’ code (essentially a function with parameters). My program then uses a provided C/C++ compiler (say TCC - Tiny C Compiler which is unfortunately pretty slow, even considering it’s only CPU) to convert the user ‘script’ code into a DLL (which could contain a few user function potentially). All this is happening during runtime of the main program. My graphics program would then call then a particular DLL containing the user-specified compiled function, and use that inside the main program in conjunction.
Okay imagine all that, but instead of on the CPU, imagine it all on the GPU under OpenCL, where not only my graphics program is GPU accelerated, but also the user’s DLL. Please let me know if that’s not clear or if you have any questions.
What’s the OpenCL equivalent of .cubin? Based on what you’ve said, I’m thinking a possible plan of action would be to use Clang or LLVM to convert to PTX, and then use ‘something’ to convert from PTX to the final binary object code that the either the Nvidia or AMD GPUs can understand.