See the opencl1.1 reference manual 6.11.11 “Atomic Functions”.
“6.11.11 Atomic Functions
The OpenCL C programming language implements the following functions that provide atomic operations on 32-bit signed, unsigned integers and single precision floating-point50 to locations in __global or __local memory.”
e.g. see the prototypes:
int atomic_add(volatile __global int *p, int val)
unsigned int atomic_add(volatile __global unsigned int *p, unsigned int val)
int atomic_add(volatile __local int *p, int val)
unsigned int atomic_add(volatile __local unsigned int *p, unsigned int val)
Shows that atomic_add can either be used on a global or local pointer, with signed or unsigned integers.
(in opencl 1.0 atomics are an extension, listed in section 9.5, and begin with atom_)
If not then how can I create an efficient semaphor on some variable or data structure in the shared memory???
If you’re trying to use GPU `threads’ the same way as CPU threads, then you’ll be disappointed, they’re not the same thing: so using the same serialisation primitives you’d use for multi-threaded cpu code may not work very well at all.