Problem with float with OpenCL in CPU

I have this code in OpenCL:


__kernel void vector_add(__global float *C) {
    int i = get_global_id(0);
    float t =3.5f;
    C[i] = t;
}

When I run in GPU the return in variable C is 3.5, so it’s ok.
But when I run in CPU the return is 3.0.
This happen with any float value.

But if I run this code:


__kernel void vector_add(__global const float A, __global float *C) {
    int i = get_global_id(0);
    C[i] = A;
}

And I passed the 3.5 value to variable A, this run ok in CPU and GPU.

Why this happen?

Looks like a weird (and wrong) compiler optimization. The difference between the two kernels is that in the second kernel, the compiler has no visibility on the value, hence it can’t perform this weird optimization.
Can you provide more details ? which OpenCL implementation are you using ?

Some directions to check:

  1. Check that you’re compiling the kernel without any math optimization flags. In addition, add the “O0” flag (Optimization level 0). Although it is not documented, most compilers will respect it.
  2. try to add explicit casts to the kernel as below. It is unnecessary, but may expose the compilation problem

__kernel void vector_add(__global float *C) {
int i = get_global_id(0);
float t = (float) 3.5f;
C[i] = (float) t;
}