OpenCL and SSE

When using an OpenCL data parallel kernel on an SSE enabled CPU, does OpenCL automatically create SSE code to map work items to the channels of the SSE compute units? Or do you have to code using the OpenCL vector data types to take advantage of the SSE?

The manual seems to suggest that when using the data parallel programming model, SSE code is generated automatically. While with the task parallel model, you have to use the vector datatypes. However, I’ve seen some comments around the web that would seem to suggest that you always have to use the vector data types to generate SSE code…

My understanding of current OpenCL compilers is that they will not run multiple work-items across the SSE vectors at the same time. I’m not sure the SSE instruction set is sufficiently vector-complete to allow that in general. (Intel claimed Larabee’s was, for example.) Currently you need to use the vector types to get SSE code.