Hello!
Let’s say x, y and z are three cl_mem(sizeof(float)*9) … I was wondering if these kind of kernels were safe/portable. I couldn’t find information on the standard whether this was legal or not.
__kernel void(__global float4* x, __global float4* y, __global float4* z){
z[0] = x[0] + y[0]
z[1] = x[1] + y[1]
__global float* new_x = (float*)x;
__global float* new_y = (float*)y;
__global float* new_z = (float*)z;
new_z[8] = new_x[8] + new_y[8];
}
In my case, filling x,y,z with 0 to match the alignment is not really an option , and vloadn/vstoren seem to induce a lot of overhead… and vectorization is a benefit I don’t want to lose if x,y,z are of size 4 000 001
Thank you!
Edit :
According to the standard :
Casting a pointer to a
new type represents an unchecked assertion that the address is correctly aligned. The developer
will also need to know the endianness of the OpenCL device and the endianness of the data
What does endianness mean when it comes to vector types? Does it apply to the whole vector or each individual element?