clBuildProgram return 1

Hello,

How could clBuildProgram return 1?
After reading the specification and the cl.h i saw that it can’t return 1 (just <= 0, success or error).

Is it a bug?

my code :

cl::PlatformList platforms = cl::GetPlatformList();

if(platforms.empty())
{
	return;
}

cl::Platform & platform = platforms.front();

cl::DeviceList devices = platforms.front().GetDeviceList(deviceType);

if(devices.empty())
{
	return;
}

cl::Device &	     device		(devices.front());
cl::Context		     context		(device);
cl::CommandQueue  commandQueue(context, device);
cl::Buffer		     buffer		(context, cl::Mem::WRITE_ONLY, ARRAY_SIZE * sizeof(Array::value_type));
cl::Program	     program	(context, sample::LoadSourceFile("sample1.cl"));

try
{
	program.Build();
}
catch (std::exception const & e)
{
}

ps:
I use nVidia 195.55 under vista 64.

Same problem here.

The examples work, but my code returns 1 instead of an error code or CL_SUCCESS. Did you ever work it out?

I’m on 32 bit Ubuntu, nvidia 8400M.

Thanks,

In my case when it returned 1 the source didn’t compile. So I manage it as an error. By the way it’s a bug, and it should be reported to nVidia (but I don’t have an nVidia developer account).

You have a case in which the source should compile but clBuildProgram return 1?

Yeah…I figured it was a bug. I was on the 195.xx beta driver, and I’ve since rolled back to 190.29 and the code runs perfectly–without changes. I don’t have a developer account either. Is there a way to submit this bug without one?

I’m curious where those C++ bindings came from? Are they shipping with the NVidia windows sdk? As they look similar to the C++ bindings on the khronos registry page, but not exactly the same: http://www.khronos.org/registry/cl/api/1.0/cl.hpp

-Brian

It’s just a personal c++ binding.

Ah. They’re working on making the one on the registry part of the OpenCL 1.1 standard. We’re in the phase of making some major revisions in preparation for that.So any discrepancies or wishes that you have for it, now is a good time to voice them.

For the most part, it looks exactly the same as what you have.

This is my wrapper API : http://matrem84.free.fr/clapi/html, so the c++ API I wished to use :slight_smile: .

some ideas :

  • all OpenCL call are wrapped
  • transparent retain/release (constructor/destructor/operator =)
  • hard usage of objects
  • all instance are in a valid state as soon as they are instanciated
  • hard usage of enum
  • assert all code constaints (for example parameter constaints from the OpenCL specification)
  • throw exception for all other problems (for example all OpenCL call that don’t return CL_SUCCESS)

and for compatibility with other libs :

  • possibility to import “raw cl_object” to create c++ API objects
  • possibility to export “raw cl_object” from c++ API objects

Thanks for the feedback. Just to get a discussion started I’m going to ask you about some of your points. Everything I don’t ask about assume I agree with. :slight_smile:

Not sure what you mean by this, do you just mean that all the cl_* types should be an object? If so, I agree.

Not sure about this. What if the user wanted to do the following:


cl::Context context;
if (foo)
  context = cl::Context(CL_DEVICE_TYPE_CPU;
else
  context = cl::Context(CL_DEVICE_TYPE_GPU;

This idiom requires a default constructed “NULL” object. Also, for use with standard stl containers is a default constructor required? For example, vector::push_back could be implemented this way:


template<T>
class vector {
push_back(const T& obj) {
  resize(size()+1);
  *this[size()-1] = obj;
}
}

Every class could have a operator bool for testing whether an object was valid. And most class methods could just assert(operator bool()) to pick up the cases when the object was accidentally not initialized properly.

I do recognize the appeal of being assured the objects are always valid, I just don’t see a way of doing this and keeping their use flexible.

Is this for compile time type checking to make sure a function is passed a proper constant value? I like this idea.

I like this idea since the error codes returned by OpenCL can be less than informative. The assertions can contain reasonable error messages. Last week it took me a good part of a day to discover that my global work size was not divisible by the local work size. Something an assertion like this could have picked up easily.

Ultimately, the API has to come to a decision about how heavy weight it should be. Or from the other view point, how leaky the implementation should be. This paragraph is in the bindings for the registry:

  • The bindings themselves are lightweight and correspond closely to the
  • underlying C API. Using the C++ bindings introduces no additional execution
  • overhead.

For example this can be seen in the different way chosen to wrap “clCreateKernelsInProgram”. Your signature is this:


KernelList cl::Program::CreateAllKernels()

The registry’s is this:


cl_int cl::Program::createKernels(VECTOR_CLASS<Kernel>* kernels)

Return by value can be expensive compared to an output argument. Also, is there a good reason to choose std::list instead of std::vector? vectors are typically more performant for this sort of operation. (Note, the registry’s version should take a reference instead of a pointer, that will be changed.)

So some of the design comes down to the slightly subjective decision about how much C++ishness should be used. For the clCreateKernelsInProgram example your design is actually easier to eventually wrapping into higher level languages (python, java, etc). Thanks for the feedback. I’d love to hear anymore specific feedback you have.

-Brian

Not sure what you mean by this, do you just mean that all the cl_* types should be an object? If so, I agree.

Yes, use object as soon as it’s possible.

Not sure about this. What if the user wanted to do the following:

You example can be simplified in one line but in the general case, it will use pointer (raw, scoped, or shared).
I think that’s the way a UML relation with 0…1 multiplicity should be implemented.

Also, for use with standard stl containers is a default constructor required

I don’t think so, It seems copy constructor and operator = are enough. http://www.sgi.com/tech/stl/Container.html (Value Type). But in case of container which need default constructible object I’ll use shared_ptr.

Is this for compile time type checking to make sure a function is passed a proper constant value? I like this idea.

Yes, I should have say : maximize compile time checking

Ultimately, the API has to come to a decision about how heavy weight it should be. Or from the other view point, how leaky the implementation should be. This paragraph is in the bindings for the registry:

I was talking about debug assertion, so lightweight assertion. But perhaps it could be good to use std::logic_error. Personally I think debug assertions are sufficient (By the way if developer run in release an invalid code, an exception will be raised because cl call won’t return CL_SUCCESS).

Return by value can be expensive compared to an output argument

I asked myself about always passing by reference instead of return by value, in case there is a copy constructor call.
In fact I only use this optimization (reference usage) if their is a real penalty. In many case compilers use RVO http://en.wikipedia.org/wiki/Return_value_optimization.
I prefer to use the return syntax, but it’s not really an important wish

how much C++ishness should be used

It’s a c++ API so I would say : full c++ usage :stuck_out_tongue:

I’ve just had the same problem: clBuildProgram returning 1.

The issue was:

unsigned char GetPix ( unsigned char* Picture, unsigned int width, unsigned int height, unsigned int x, unsigned int y )

#define GETPIX(xz,yz) GetPix ( Picture, width, height, x+xz, y+yz );i++;

__kernel void WireFrame( __global unsigned char* Picture, __global unsigned char* WireFrame, unsigned char delta )

The problem was that I did not have the right memory space chosen for Picture in GetPix function, I’ve commented out the #define line and the clBuildProgram returned right error with all the information, gathered by clGetProgramBuildInfo procedure. (BTW: clGetProgramBuildInfo returned only the source code with clBuildProgram returning 1)

IMHO issue is inside #defines, when a line with #define code has an error, then OpenCL Builder has issues finding the line with an error and it returns nothing.

My 2 cents:
I’d suggest being very careful about trying to check for things like the local work group size being evenly divisible into the global work group size. The reason for this is that you can check for some of them easily, but others require internal knowledge of the runtime/device state. (Such as checking that the total sum of local memory will fit on the device since this can be declared in the kernel as well as through the runtime, and hence requires access to the compiled kernel.)

I wonder if it is really better to provide a C++ interface that catches some (rather arbitrary) subset of the runtime errors at that level, or to rely on the actual OpenCL implementations to report back errors. E.g., in the case of incompatible local/global sizes, the clEnqueueNDRangeKernel command should have reported an error.

The OpenCL API provides a detailed error reporting mechanism through the context call back function. I know this is implemented on MacOS X to provide detailed, descriptive error messages for almost all errors. (E.g., it will tell you which of the local/global dimensions are not compatible, or that you are trying to use X bytes of local memory when you only have Y, in addition to returning a CL_ERROR code.)

I would think that using the context callbacks to get error codes from the runtime, and reporting those to the user would be more robust and generic. This would allow (and, more importantly, encourage) the vendors to provide better error checking and reporting, without having the C++ API re-implement some subset of the same error-checking as the vendors.

Of course it may be that more error checking is always better, but it will complicate the C++ API code.

I didn’t realize the call back functions reported such detail. I would agree it would be cleaner to keep all error checking at the same abstraction layer. Now the fun C++ trick will become registering a callback function to grab the error message text to dump into a C++ exception. Especially since the call back is registered on the context, but could be triggered by any number of threads simultaneously. How is the user suppose to map a notification function invocation back to the thread that triggered the error?

You can pass user private data into the context callback (if I remember correctly) which can then be mapped to the thread by the user.

Also, the detail of the provided information may vary from platform to platform. My only experience with the context call back is on Apple’s platform where it is very good. (They also provide the cool ability to dump all context messages to the console by setting CL_LOG_ERRORS=stdout.)

I see the “user_data” argument to clCreateContext. But that still doesn’t help much for mapping error messages back to the thread that incurred it. For example, a main thread could set up a context with two devices.


main thread:
context = Context([device1, device2], pfn_notify)

A thread is then spawned to handle computation on the two separate devices, and then they both do something nasty to invoke an error in the OpenCL implementation:


thread1:
queue = CommandQueue(context, device1)
// do something nasty


thread2:
queue = CommandQueue(context, device2)
// do something nasty

Then pfn_notify will be invoked twice, and even if user_data contained the pointers to each of the threads, it couldn’t know which caused the error. The problem is this sentence in the standard, “This callback function may be called asynchronously by the OpenCL implementation.” If we were guaranteed that the callback function was executed by the same thread that caused the error we could store the error message in thread local data and recover it later. This is how ‘errno’ works in the posix standard.

Or am I missing a trick somewhere?

-Brian

I think you’re right. I’ve always figured I’d keep one context per thread to avoid having to manually lock around CL calls so then my mapping would work.