CommandQueue async methods don't release Events properly

Hello,
I found a strange behaviour in the header cl.hpp, and I’d like some feedback on this.

In the very example on top of cl.hpp I find these instructions:

        cl::Event event;
       cl::CommandQueue queue(context, devices[0], 0, &err);
       queue.enqueueNDRangeKernel(
           kernel, 
           cl::NullRange, 
           cl::NDRange(4,4),
           cl::NullRange,
           NULL,
           &event); 
 
       event.wait();

So I think I’m supposed to pass a pointer to an Event object.
However when inside the method enqueueNDRangeKernel the Event object pointer is casted to a cl_event pointer, like this:

::clEnqueueNDRangeKernel(
                object_, kernel(), (cl_uint) global.dimensions(),
                offset.dimensions() != 0 ? (const ::size_t*) offset : NULL,
                (const ::size_t*) global,
                local.dimensions() != 0 ? (const ::size_t*) local : NULL,
                (events != NULL) ? (cl_uint) events->size() : 0,
                (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL,
                (cl_event*) event),

I see 2 problems in this instruction:

  1. When debugging I know for sure that this doesn’t release the previous cl_event stored in the Event wrapper. If you put this code in a loop it’ll fill up your RAM. On the other hand there is no easy way to release the previous Event easily, since the Wrapper::release method is protected, the fastest way is to assign a null Event.
  2. Why are we casting an Event pointer to a cl_event pointer? I don’t see any operator& overloading in the wrapper that permits us to do such a thing. I’m no C++ guru, if somebody can explain me that i’d be glad to understand. Is it because the cl_event is the first and only field in the class (therefore it starts exactly where the Event object starts)?

Thank you for any feedback,
Federico

Thanks, fedechicco. I have created a bug report in the public Khronos bugzilla to keep track of this.

Hi Federico,

Can you post a more complete example of the memory leak. Including what OpenCL implementation you are using. The following code is rock solid constant in terms of memory usage on Apple’s implementation on a CPU device:


#include <iostream>

#define __CL_ENABLE_EXCEPTIONS
#include "cl.hpp"

const char * helloStr  = "__kernel void "
                         "hello(void) "
                         "{ "
                         "  "
                         "} ";

int main(int, char *[])
{
  try 
  {
    std::vector<cl::Platform> platforms;
    cl::Platform::get(&platforms);
    assert(!platforms.empty());

    std::vector<cl::Device> devices;
    platforms.front().getDevices(CL_DEVICE_TYPE_ALL, &devices);
    assert(!devices.empty());

    cl::Context context(devices); 

    cl::Program::Sources source(1,
                                std::make_pair(helloStr,strlen(helloStr)));
    cl::Program program = cl::Program(context, source);
    program.build(devices);

    cl::Kernel kernel(program, "hello");
    for (std::vector<cl::Device>::iterator d = devices.begin(); d != devices.end(); ++d)
    {
      for (unsigned int i = 0; i < 10000000; i++)
      {
        cl::CommandQueue queue(context, *d);

        cl::Event event;
        queue.enqueueNDRangeKernel(kernel, 
                                   cl::NullRange, 
                                   cl::NDRange(1000), 
                                   cl::NullRange,
                                   NULL,
                                   &event);
        event.wait();
      }
    }
  } 
  catch (cl::Error err) 
  {
    std::cerr << "ERROR: "
              << err.what()
              << "("
              << err.err()
              << ")"
              << std::endl;
    return 1;
  }

  return 0;
}

In response to #2 it is safe to cast an cl::Event pointer to a cl_event pointer because the implementation of cl::Event is essentially the following:


struct Event
{
  cl_event obj_;
};

-Brian

In this code you declare “cl::Event event” inside the loop, so event.obj_ is allocated and set to null. After you set it to null you give it to the enqueueNDRangeKernel and it overwrites it, no problems with that.
Finally you wait for the event and call cl::Event destructor on the “event” object when its scope ends, that frees the resource calling Wrapper::release on the cl::Event object.

If you try instead a code like this:


      cl::Event event;
      for (unsigned int i = 0; i < 10000000; i++)
      {
        cl::CommandQueue queue(context, *d);
        queue.enqueueNDRangeKernel(kernel, 
                                   cl::NullRange, 
                                   cl::NDRange(1000), 
                                   cl::NullRange,
                                   NULL,
                                   &event);
        event.wait();
      }

you have nothing in the inner loop that calls the Wrapper::release method, because enqueueNDRangeKernel doesn’t do that, it simply overwrites the event.obj_ field.

The code i wrote should reproduce my memory leak, even though I’d expect it to have the same behaviour of the code you posted.

Cheers,
Fede

Ah yes, I see it now. For a temporary workaround you can use a temporary cl::Event object and then assign it back. operator= will do the proper thing with retain and release.


      cl::Event event;
      for (unsigned int i = 0; i < 10000000; i++)
      {
        cl::CommandQueue queue(context, *d);

        cl::Event tmp;
        queue.enqueueNDRangeKernel(kernel, 
                                   cl::NullRange, 
                                   cl::NDRange(1000), 
                                   cl::NullRange,
                                   NULL,
                                   &tmp);
        event = tmp;
        event.wait();
      }

I’ll work on getting a fix into the bindings for the next version.

Yeah, i fixed it someway, it was easy once I figured the problem out.

Thanks a lot for fixing this,
Fed