OpenCL wrapper Overhead

Every time setKernelArgs is called it calls clreleaseMemObject then clRetainMemObject. This really adds up. for 10 setKernelArgs it is taking at least 0.2-0.4ms.

What implementation are you talking about?