Hello,
I am using an AMD Radeon RX460 Card with the AMD APP SDK 3.0 with OCL 2.0 and Visual Studio.
My code basically does a lot of operations and in the end sums certain ones up in global memory, using atomic operations.
[NOTE]atomic_add(bufOut + 2 * local_id + offset, f0);[/NOTE]
bufOut is a __global float*
f0 a float value
[NOTE]C:\Users\albre_fl\AppData\Local\Temp\OCL7444T1.cl:1508:3: error: no matching function for call to ‘atomic_add’
atomic_add(bufOut + 2 * me + 0 + 0 , f0);
^~~~~~~~~~
c:\constructicon\builds\gfx wo\17.10\stream\opencl\compiler\clc2\ocl-headers\build\wNow\B_rel\opencl12_builtins.h:4256:35: note: candidate function not viable: no known conversion from ‘__global float *’ to ‘volatile __global int *’ for 1st argument
int attribute ((overloadable)) atomic_add(volatile __global int *p, int val);
…
[/NOTE]
It seems to me that it cant add float but only int values. But according to the doc (https://www.khronos.org/registry/OpenCL/sdk/2.0/docs/man/xhtml/atomicFunctions.html ) it should be possible, right?
I also found this https://forums.khronos.org/showthread.php/12935-Atomic-extension post, where the problem is that a float value instead of a pointer is used. but after correcting this it seems to work.
So what am i doing wrong here?
Thanks
Maybe you can turn this into an “add” version:
//Function to perform the atomic max
inline void AtomicMax(volatile __global float *source, const float operand) {
union {
unsigned int intVal;
float floatVal;
} newVal;
union {
unsigned int intVal;
float floatVal;
} prevVal;
do {
prevVal.floatVal = *source;
newVal.floatVal = max(prevVal.floatVal,operand);
} while (atomic_cmpxchg((volatile __global unsigned int *)source, prevVal.intVal, newVal.intVal) != prevVal.intVal);
}
Thank you, but i already found this kind of solution myself. Sadly this kind of function seems to be pretty slow.
And there should be a faster way. Especially if it says so in the documentation
Then do it in local space first, for all workgroup threads, then synchronize/atomically on global space.
Tugrul
June 25, 2017, 7:19am
6
You can use 2 integers. 1 for integer part(easy single operation), 1 for floating part (bitwise interpretation).
Then when youre done with these, convert thse 2 integers A and B to floats as A.0 and 0.B and add them.
Maybe you can even use 1 integer per digit after delimiter.
before:
b=integerPart(floatingPointPart(num)*10);
c=integerPart(floatingPointPart(num)*100);
after:
atomic_add(… a …)
atomic_add(… b10 …)
atomic_add(… c 100 …)
atomic_add(… d*1000 …)
A.BCDEFGH
num = A + B/10 + C/100 + D/1000 + E/10000 + F/100000 + G/1000000 + H/10000000 in the end.