I have some functions working with float8. During optimization it turns out that math with native lenght, like float4, vectors works faster. I don’t want to re-write all code I have, but just split my float8 functions to two float4 functions calls.
void f_native(float4 a)
{
//do something in vector4 math
}
void f(float8 a)
{
float4* ta = &a;
f_native(ta[0]);
f_native(ta[1]);
}
Nvidea SDK issues warnings about this code. Is it any proper way to do such conversion without expencive performance overhead?