I’m new in OpenCL. I check code from somebody else and it’s look like this
struct Scene
{
__global float* vertics;
...
}
“vertics” is an array of float, but inside you have POSITION and NORMAL.
To get the position We have this fonction
inline float4 GetVertexPosition(__local struct Scene *s, uint vertexID)
{
__global float* offset = 0;
offset = s->vertics + vertexID * 8;
return (float4)(*offset,
*(offset + 1),
*(offset + 2),
1.0f);
and to get normal
inline float4 GetVertexNormal(__local struct Scene *s, uint vertexID)
{
__global float* offset = 0;
offset = s->vertics + vertexID * 8;
return (float4)(*(offset + 4),
*(offset + 5),
*(offset + 6),
0.0f);
I know, when we program in HLSL it’S better to use float4 directly when we can. Then I try this easy change to see if it’s better
struct Scene
{
__global float4* vertics;
...
}
inline float4 GetVertexPosition(__local struct Scene *s, uint vertexID)
{
return s->vertics[vertexID * 2];
}
inline float4 GetVertexNormal(__local struct Scene *s, uint vertexID)
{
return s->vertics[vertexID * 2 + 1];
}
I profiled each example. The first one is faster. Not a huge difference, but still faster. I tought it’s should be faster to use float4* directly instead of float* and convert into a float4.
I use the same buffer in each situation, then alignement should be the same. I only change what I wrote.
Somebody can explain why it’s faster to use float*?
Thanks