Currently, I am writing an algorithm on 9800 GT with OpenCL. I used OpenCL Visual Profiler to watch performance and I found "clReleaseMemOBject" takes as much time than "clEnqueueReadBuffer" for the same MemObj. I just want unallocated GPU memory to liberate space, I don't need to read them.
Do you know why "clReleaseMemObj" take so much time?
Is it a Nvidia issue or is it the same on ATI GPU?
Do you know an other way faster to unallocated memory?

Thanks a lot.