Float4* VS Float*

qqchose · November 19, 2012, 11:12am

I’m new in OpenCL. I check code from somebody else and it’s look like this


struct Scene
{
__global float* vertics;
...
}

“vertics” is an array of float, but inside you have POSITION and NORMAL.
To get the position We have this fonction


inline float4 GetVertexPosition(__local struct Scene *s, uint vertexID)
{
    __global float* offset = 0;
    offset = s->vertics + vertexID * 8;

    return (float4)(*offset, 
                    *(offset + 1), 
                    *(offset + 2), 
                    1.0f);

and to get normal


inline float4 GetVertexNormal(__local struct Scene *s, uint vertexID)
{
    __global float* offset = 0;
    offset = s->vertics + vertexID * 8;
    return (float4)(*(offset + 4), 
                    *(offset + 5), 
                    *(offset + 6), 
                    0.0f);

I know, when we program in HLSL it’S better to use float4 directly when we can. Then I try this easy change to see if it’s better


struct Scene
{
    __global float4* vertics;
    ...
}

inline float4 GetVertexPosition(__local struct Scene *s, uint vertexID)
{
    return s->vertics[vertexID * 2];
}


inline float4 GetVertexNormal(__local struct Scene *s, uint vertexID)
{
    return s->vertics[vertexID * 2 + 1];
}

I profiled each example. The first one is faster. Not a huge difference, but still faster. I tought it’s should be faster to use float4* directly instead of float* and convert into a float4.

I use the same buffer in each situation, then alignement should be the same. I only change what I wrote.

Somebody can explain why it’s faster to use float*?

Thanks

clint3112 · November 19, 2012, 11:31pm

Hi,

there shouldn’t be such a huge difference in execution time. Main reason why float4 is faster on GPU architecture is that the GPU architecture is optimized for float4 data. The memory Controller always gets you chunks of 128 Byte of Data. Look for coalesced memory access to get a better idea of the problem.

Greetings,
clint3112

qqchose · November 20, 2012, 4:41am

Thanks, I will read about
coalesced memory access

I know GPU is optimised for Float4 :). All register are float4. For this reason I tried to change this :p.