Is there a single instruction to calculate the sum of all components of a float4, e.g., in OpenCL?
Example:
float4 v;
float desiredResult = v.x + v.y + v.z + v.w;
Is there an SSE-like set of instructions?
Is there a single instruction to calculate the sum of all components of a float4, e.g., in OpenCL?
Example:
float4 v;
float desiredResult = v.x + v.y + v.z + v.w;
Is there an SSE-like set of instructions?
I’ve been wondering also why there’s no built-in sum function in the OpenCL specification, but maybe that’s because a compiler could generate it from optimizing the built-in dot function, e.g., sum = dot(v, (float4)(1)). However, I haven’t checked this yet.