Atomic Operation on global memory with float values

Hello,
I am using an AMD Radeon RX460 Card with the AMD APP SDK 3.0 with OCL 2.0 and Visual Studio.

My code basically does a lot of operations and in the end sums certain ones up in global memory, using atomic operations.

[NOTE]atomic_add(bufOut + 2 * local_id + offset, f0);[/NOTE]

bufOut is a __global float*
f0 a float value

[NOTE]C:\Users\albre_fl\AppData\Local\Temp\OCL7444T1.cl:1508:3: error: no matching function for call to ‘atomic_add’
atomic_add(bufOut + 2 * me + 0 + 0 , f0);
^~~~~~~~~~
c:\constructicon\builds\gfx wo\17.10\stream\opencl\compiler\clc2\ocl-headers\build\wNow\B_rel\opencl12_builtins.h:4256:35: note: candidate function not viable: no known conversion from ‘__global float *’ to ‘volatile __global int *’ for 1st argument
int attribute((overloadable)) atomic_add(volatile __global int *p, int val);

[/NOTE]

It seems to me that it cant add float but only int values. But according to the doc (https://www.khronos.org/registry/OpenCL/sdk/2.0/docs/man/xhtml/atomicFunctions.html) it should be possible, right?
I also found this https://forums.khronos.org/showthread.php/12935-Atomic-extension post, where the problem is that a float value instead of a pointer is used. but after correcting this it seems to work.
So what am i doing wrong here?

Thanks

Maybe you can turn this into an “add” version:


 //Function to perform the atomic max
 inline void AtomicMax(volatile __global float *source, const float operand) {
    union {
        unsigned int intVal;
        float floatVal;
    } newVal;
    union {
        unsigned int intVal;
        float floatVal;
    } prevVal;
    do {
        prevVal.floatVal = *source;
        newVal.floatVal = max(prevVal.floatVal,operand);
    } while (atomic_cmpxchg((volatile __global unsigned int *)source, prevVal.intVal, newVal.intVal) != prevVal.intVal);
}

Thank you, but i already found this kind of solution myself. Sadly this kind of function seems to be pretty slow.

And there should be a faster way. Especially if it says so in the documentation

Then do it in local space first, for all workgroup threads, then synchronize/atomically on global space.

I have the same problem

You can use 2 integers. 1 for integer part(easy single operation), 1 for floating part (bitwise interpretation).

Then when youre done with these, convert thse 2 integers A and B to floats as A.0 and 0.B and add them.

Maybe you can even use 1 integer per digit after delimiter.

before:

b=integerPart(floatingPointPart(num)*10);

c=integerPart(floatingPointPart(num)*100);

after:

atomic_add(… a …)
atomic_add(… b10 …)
atomic_add(… c
100 …)
atomic_add(… d*1000 …)

A.BCDEFGH

num = A + B/10 + C/100 + D/1000 + E/10000 + F/100000 + G/1000000 + H/10000000 in the end.