I’ve a kernel which needs to compute the perp dot product of several pairs of 2D vectors. I was wondering which would be the preferred way of coding it so that it gets the best performance on both NVIDIA and AMD GPUs.
The obvious approach I wrote is:
float r_num = (v1.s1 * v2.s0) - (v1.s0 * v2.s1);
(where v1 and v2 are two float2 vectors)
But… I feel it’s a pity I cannot benefit from a single function such as dot() (which I guess will optimize a dot product better than if I write it manually).
The idea of passing the vectors pre-transformed so that they’re “pre-perpendicularized” is not an option because the kernel also needs the original vectors.
What do you think is the best, from a performance point of view? Note that the kernel will run on GPU, either AMD or NVIDIA.
Thank you for any advice!