It is certainly possible that OpenCL can be faster than OpenMP. Are you using OpenCL vector types in your kernel? If so, these should get mapped to the appropriate SSE instructions on your CPU which should give you an additional speedup in CL. I do not believe OpenMP compilers generate SSE so there is a 2 - 2.5x delta in performance that can be achieved using CL vector types.