Most efficient way of computing perp dot products in OpenCL?

I’ve a kernel which needs to compute the perp dot product of several pairs of 2D vectors. I was wondering which would be the preferred way of coding it so that it gets the best performance on both NVIDIA and AMD GPUs.

The obvious approach I wrote is:

float r_num = (v1.s1 * v2.s0) - (v1.s0 * v2.s1);
(where v1 and v2 are two float2 vectors)

But… I feel it’s a pity I cannot benefit from a single function such as dot() (which I guess will optimize a dot product better than if I write it manually).

The idea of passing the vectors pre-transformed so that they’re “pre-perpendicularized” is not an option because the kernel also needs the original vectors.

What do you think is the best, from a performance point of view? Note that the kernel will run on GPU, either AMD or NVIDIA.

Thank you for any advice!

I’m not sure what a “perp dot product” is. What you show is absolutely not a dot() and seems more like cross() to me.
This indeed seems to go with the fact CL1.2 has only cross(vec3,vec3) and cross(vec4,vec4).
Do not worry about it.
In the past, dot/cross ops were indeed in special hardware (to exploit the parallel muls). That day is long gone. GPUs are now scalar (nv) or “clustered scalar” (amd) and feature fairly involved compilers which have access to more instructions than CL has intrinsics.

Literally the first hit on Google for that exact phrase.

Thank you for giving me access to a notion I haven’t used in 20 years; you can bet it’ll be forgotten by tomorrow.