Question about OpenCL samplers

Hello,

I am very new to OpenCL and I have a question about how samplers are dealt with.

I am dealing mostly with 3D images and as I understand it, I have to do the following:

  • Create an image with createImage3D and then say all I want to do is interpolate the image using some transformation, then I create a sampler object and associate it with this 3D image and I can have continuous indexing.

I am guessing on the GPU, the sampler object binds the image to a texture and can use the hardware accelerated interpolation operations available on the GPU. But what does OpenCL do when the underlying hardware is a multicore CPU? Can a sampler image be even used? I cannot test it at the moment as my OpenCL code crashes on my Macbook Air :wink:

Also, another question that does the sampler object have a lot of memory overhead (does it replicate the data). I am trying to design an abstract image class where once the user creates an image there will be a sampler automatically associated with it, so that the resampling can be done. However, i wonder if i should create the sampler as needed and then release it.

These questions might be quite n00b and I am sorry for that. However, i would be really grateful if someone can help me with these doubts.

Many thanks,

xarg

I don’t know the internal details but from seeing the way they work, it seems sampler objects are like a macro or a bit-field which tells the read_image*() functions how to read the data.

I guess on a GPU they equate to bound textures, but on a CPU they just equate to a macro or a case in a switch statement as everything is just normal code there. On a CPU you might be better off doing your own interpolation as you can hard-code the data format, whereas the sampler doesn’t contain this information so it still needs to query the image storage type and dimensions at run-time.

One of the main benefits on the GPU apart from it doing the address calculations for you is the non-linear memory images normally use, which improves cache coherency of certain algorithms.

I don’t see why they would take up any memory, they are just a code construct, not a data one.

Many thanks for the reply! I will test it on the CPU as soon as I have the code working and see what I get out of it.

I guess doing own implementation might be much more effieint on CPU as well, like you say. In my case, the data is always floating point values, so I can hard-code some of the stuff and the compiter might optimize for it.

Thanks,

xarg