Memory access pattern

Hi all,

I want to store in global memory a 2D grid where each cell grid stores 2 floats. I was wondering which is the best way. Every time a thread needs to read a cell grid position of memory it will always need to read the two floats. I mean, there’s no situation where I just need the information of the first float, and the other way around. Additionally many threads can be accessing the same cell grid position simultaneously.

So, according to this, my question. Is it more efficiently to store my grid a n cl_float2, being n the number of grid cells, or is it better to store it as n*2 cl_float?

I think the idea is clear.

Thank you.

Use float2 for the kernel arguments, for the hardware i’ve used it is somewhat faster. On the host side it probably wont make much difference.

It tells the compiler you will only ever access the data aligned to float2’s, so it can probably use a single or faster instruction to load it.

Or … use float4, and process two at once … :wink:

I understand that this is solved in compilation time. I mean, I do not have to do anything special, just declare my array as float2 and the compiler optimizes the access instructions. Correct me if I am wrong.

Thanks, but because of the nature of my problem, I can’t.