In my case when it returned 1 the source didn’t compile. So I manage it as an error. By the way it’s a bug, and it should be reported to nVidia (but I don’t have an nVidia developer account).
You have a case in which the source should compile but clBuildProgram return 1?
Yeah…I figured it was a bug. I was on the 195.xx beta driver, and I’ve since rolled back to 190.29 and the code runs perfectly–without changes. I don’t have a developer account either. Is there a way to submit this bug without one?
I’m curious where those C++ bindings came from? Are they shipping with the NVidia windows sdk? As they look similar to the C++ bindings on the khronos registry page, but not exactly the same: http://www.khronos.org/registry/cl/api/1.0/cl.hpp
Ah. They’re working on making the one on the registry part of the OpenCL 1.1 standard. We’re in the phase of making some major revisions in preparation for that.So any discrepancies or wishes that you have for it, now is a good time to voice them.
For the most part, it looks exactly the same as what you have.
Thanks for the feedback. Just to get a discussion started I’m going to ask you about some of your points. Everything I don’t ask about assume I agree with.
Not sure what you mean by this, do you just mean that all the cl_* types should be an object? If so, I agree.
Not sure about this. What if the user wanted to do the following:
This idiom requires a default constructed “NULL” object. Also, for use with standard stl containers is a default constructor required? For example, vector::push_back could be implemented this way:
Every class could have a operator bool for testing whether an object was valid. And most class methods could just assert(operator bool()) to pick up the cases when the object was accidentally not initialized properly.
I do recognize the appeal of being assured the objects are always valid, I just don’t see a way of doing this and keeping their use flexible.
Is this for compile time type checking to make sure a function is passed a proper constant value? I like this idea.
I like this idea since the error codes returned by OpenCL can be less than informative. The assertions can contain reasonable error messages. Last week it took me a good part of a day to discover that my global work size was not divisible by the local work size. Something an assertion like this could have picked up easily.
Ultimately, the API has to come to a decision about how heavy weight it should be. Or from the other view point, how leaky the implementation should be. This paragraph is in the bindings for the registry:
The bindings themselves are lightweight and correspond closely to the
underlying C API. Using the C++ bindings introduces no additional execution
overhead.
For example this can be seen in the different way chosen to wrap “clCreateKernelsInProgram”. Your signature is this:
Return by value can be expensive compared to an output argument. Also, is there a good reason to choose std::list instead of std::vector? vectors are typically more performant for this sort of operation. (Note, the registry’s version should take a reference instead of a pointer, that will be changed.)
So some of the design comes down to the slightly subjective decision about how much C++ishness should be used. For the clCreateKernelsInProgram example your design is actually easier to eventually wrapping into higher level languages (python, java, etc). Thanks for the feedback. I’d love to hear anymore specific feedback you have.
Not sure what you mean by this, do you just mean that all the cl_* types should be an object? If so, I agree.
Yes, use object as soon as it’s possible.
Not sure about this. What if the user wanted to do the following:
You example can be simplified in one line but in the general case, it will use pointer (raw, scoped, or shared).
I think that’s the way a UML relation with 0…1 multiplicity should be implemented.
Also, for use with standard stl containers is a default constructor required
I don’t think so, It seems copy constructor and operator = are enough. http://www.sgi.com/tech/stl/Container.html (Value Type). But in case of container which need default constructible object I’ll use shared_ptr.
Is this for compile time type checking to make sure a function is passed a proper constant value? I like this idea.
Yes, I should have say : maximize compile time checking
Ultimately, the API has to come to a decision about how heavy weight it should be. Or from the other view point, how leaky the implementation should be. This paragraph is in the bindings for the registry:
I was talking about debug assertion, so lightweight assertion. But perhaps it could be good to use std::logic_error. Personally I think debug assertions are sufficient (By the way if developer run in release an invalid code, an exception will be raised because cl call won’t return CL_SUCCESS).
Return by value can be expensive compared to an output argument
I asked myself about always passing by reference instead of return by value, in case there is a copy constructor call.
In fact I only use this optimization (reference usage) if their is a real penalty. In many case compilers use RVO http://en.wikipedia.org/wiki/Return_value_optimization.
I prefer to use the return syntax, but it’s not really an important wish
The problem was that I did not have the right memory space chosen for Picture in GetPix function, I’ve commented out the #define line and the clBuildProgram returned right error with all the information, gathered by clGetProgramBuildInfo procedure. (BTW: clGetProgramBuildInfo returned only the source code with clBuildProgram returning 1)
IMHO issue is inside #defines, when a line with #define code has an error, then OpenCL Builder has issues finding the line with an error and it returns nothing.
My 2 cents:
I’d suggest being very careful about trying to check for things like the local work group size being evenly divisible into the global work group size. The reason for this is that you can check for some of them easily, but others require internal knowledge of the runtime/device state. (Such as checking that the total sum of local memory will fit on the device since this can be declared in the kernel as well as through the runtime, and hence requires access to the compiled kernel.)
I wonder if it is really better to provide a C++ interface that catches some (rather arbitrary) subset of the runtime errors at that level, or to rely on the actual OpenCL implementations to report back errors. E.g., in the case of incompatible local/global sizes, the clEnqueueNDRangeKernel command should have reported an error.
The OpenCL API provides a detailed error reporting mechanism through the context call back function. I know this is implemented on MacOS X to provide detailed, descriptive error messages for almost all errors. (E.g., it will tell you which of the local/global dimensions are not compatible, or that you are trying to use X bytes of local memory when you only have Y, in addition to returning a CL_ERROR code.)
I would think that using the context callbacks to get error codes from the runtime, and reporting those to the user would be more robust and generic. This would allow (and, more importantly, encourage) the vendors to provide better error checking and reporting, without having the C++ API re-implement some subset of the same error-checking as the vendors.
Of course it may be that more error checking is always better, but it will complicate the C++ API code.
I didn’t realize the call back functions reported such detail. I would agree it would be cleaner to keep all error checking at the same abstraction layer. Now the fun C++ trick will become registering a callback function to grab the error message text to dump into a C++ exception. Especially since the call back is registered on the context, but could be triggered by any number of threads simultaneously. How is the user suppose to map a notification function invocation back to the thread that triggered the error?
You can pass user private data into the context callback (if I remember correctly) which can then be mapped to the thread by the user.
Also, the detail of the provided information may vary from platform to platform. My only experience with the context call back is on Apple’s platform where it is very good. (They also provide the cool ability to dump all context messages to the console by setting CL_LOG_ERRORS=stdout.)
I see the “user_data” argument to clCreateContext. But that still doesn’t help much for mapping error messages back to the thread that incurred it. For example, a main thread could set up a context with two devices.
main thread:
context = Context([device1, device2], pfn_notify)
A thread is then spawned to handle computation on the two separate devices, and then they both do something nasty to invoke an error in the OpenCL implementation:
thread1:
queue = CommandQueue(context, device1)
// do something nasty
thread2:
queue = CommandQueue(context, device2)
// do something nasty
Then pfn_notify will be invoked twice, and even if user_data contained the pointers to each of the threads, it couldn’t know which caused the error. The problem is this sentence in the standard, “This callback function may be called asynchronously by the OpenCL implementation.” If we were guaranteed that the callback function was executed by the same thread that caused the error we could store the error message in thread local data and recover it later. This is how ‘errno’ works in the posix standard.
I think you’re right. I’ve always figured I’d keep one context per thread to avoid having to manually lock around CL calls so then my mapping would work.