cl_ext_atomic_counters_32 vs global memory atomics

OpenCL 1.1 has global atomics that work great on NVIDIA GPUs but slower on AMD GPUs. On AMD GPUs the counters32_t perform better. Here are the two versions:


//OpenCL 1.1 has atomic_inc build-in (no extension needed)
//see http://www.khronos.org/registry/cl/sdk/1.1/docs/man/xhtml/atomic_inc.html
__kernel void  globalAtomicKernelOpenCL1_1( volatile __global int* counter)
{
	atomic_inc(counter);
}

//OpenCL 1.1 atomic device counters extension, usually faster on current AMD hardware
//http://www.khronos.org/registry/cl/extensions/ext/cl_ext_atomic_counters_32.txt
#pragma OPENCL EXTENSION cl_ext_atomic_counters_32 : enable
__kernel void  counterAtomicKernelExt( counter32_t counter)
{
	atomic_inc(counter);
}

Unfortunately kernels using counter32_t don’t compile on platforms that don’t support this extension (such as current NVIDIA OpenCL implementations)

I rather don’t duplicate kernels just for this extension, and currently a find-and-replace in the kernel source code at runtime, dependent on the ‘cl_ext_atomic_counters_32’ availability works.


		const char* globalAtomicsKernelStringPatched = globalAtomicsKernelString;
		if (!strstr(clInfo.m_deviceExtensions,"cl_ext_atomic_counters_32"))
		{
			globalAtomicsKernelStringPatched = findAndReplace(globalAtomicsKernelString,"counter32_t", "volatile __global int*");
		}

Does anyone have a better suggestion?
Thanks,
Erwin

Don’t forget that you have a C pre-processor to serve your needs at compile time too. I just do something like this inside the opencl C source:

#ifdef cl_ext_atomic_counters_32
#pragma OPENCL EXTENSION cl_ext_atomic_counters_32 : enable
#define countert counter32_t
#else
// if you want it for opencl1.0 as well: #pragma … int32_base_atomics
#define countert volatile global int *
#endif

Then the code that uses it doesn’t need anything special:

kernel void globalAtomicKernelOpenCLAny(countert counter)
{
atomic_inc(counter);
}

Ah, I didn’t realize that each extension is also set as a preprocessor define indeed.

Thanks for the help!
Erwin