Udeepta Bordoloi at AMD has posted the following convolution tutorial for OpenCL:

http://developer.amd.com/gpu/ATIStreamS ... penCL.aspx

The tutorial focuses just on the CPU, but includes a nice description of how to vectorize your kernel. There is also a performance comparison to OpenMP. Unfortunately the example does not include the use of local memory which is really important for performance on the GPU, but it's a good place to look for a non-trivial OpenCL example program.