OpenMP / OpenCL CPU

How OpenCL could be faster than OpenMP on the same device?

I use a very simple perlin algorithm on my core i7, and I have these results :
33554432 elements, 8 work groups

simple CPU : 1.33 sec
OpenMP CPU : 0.27 sec (4.9 time faster than simple CPU)
OpenCL CPU : 0.15 sec (8.7 time faster than simple CPU)

Is it normal or even possible, or just a bug in my opencl code?

configuration:
Vista 64
nVidia gtx 275, 195.62
intel core i7
amd ati stream sdk 2 beta 4
visual studio 2008

It is certainly possible that OpenCL can be faster than OpenMP. Are you using OpenCL vector types in your kernel? If so, these should get mapped to the appropriate SSE instructions on your CPU which should give you an additional speedup in CL. I do not believe OpenMP compilers generate SSE so there is a 2 - 2.5x delta in performance that can be achieved using CL vector types.

I just use float type : float buffer and float operations.
Perhaps AMD OpenCL compiler is very smart (even on an intel processor :stuck_out_tongue: ).

It’s great that it can be even more powerfull that openMP on CPU.

thanks for your lighting.