float3?

Are you serious, no float3 ?

mtm

There’s a post in the “Suggestions for next release” about this: viewtopic.php?f=41&t=2121

Though whether it should be included is debatable. Most SIMD hardware is 4 wide (or some other power of 2). You can always pack into a float4 like the following:

float4 p1 = {x1, y1, z1, 0.0f};
float4 p2 = {x2, y2, z2, 0.0f};
float d = distance(p1, p2);

The hope is that on scalar architectures (NVidia), the compiler is smart enough to see the zero component. No idea if this is true.

I’ve seen a post by an Apple employee warning that when doing this to be careful to pack a somewhat meaningful value into the fourth component. Otherwise you can end up with the hardware throwing floating point exceptions which will kill performance. For example, using the above definitions, this would be very bad.

float4 v = p1/p2;

Since the fourth component will be a divide by zero. From this point of view a float3 would be really nice, saves all that hassle, and the compiler can do whatever is more appropriate for the targeted architecture. Then again, OpenCL is supposed to be “low level”.

-Brian

If it uses a float4 behind the scenes that is fine with me, assuming performance is decent.
But using float3 notation in an algorithm is specifically what I’m after.
I just don’t want to use an apple for an orange.
Seems to me that float3 would be a commonly used enough item to include…

Besides, if I’m doing operations on vect3 data and I have to hand code it using floats, I am wasting power efficiency (no-simd),
if I code it using vect4 operations - I am wasting power efficiency (w-component).

If you are going to be wasting power efficiency anyway, might as well just make it convenient to use.

mtm

One other thing I forgot to mention.
If float3 were a built-in, and some hardware came out in the future that did natively support it,
then the more descriptive nature of explicitly using a float3 will allow further optimization in performance and power at the time of that hardware’s release.
Maybe some clever compiler engineer using dataflow analysis could optimize it anyway, but personally I think the explicit approach is more elegant.

mtm

I’ve seen a post by an Apple employee warning that when doing this to be careful to pack a somewhat meaningful value into the fourth component. Otherwise you can end up with the hardware throwing floating point exceptions which will kill performance. For example, using the above definitions, this would be very bad.

Typically when using 4x4 matrices in graphics and vectors of float[3], the w component is usually presumed to be 1.0f so that would make for a better 4th component.