Well I have some fortunate / unfortunate news.
I went on Craigslist and picked up a ATI Radeon 6850 for $80, figuring that it’s newer than my GTX280 and I could get it cheap second hand I thought it was worth a shot.
The 6850 is better than the GTX280 (2,747 vs 1,971 in passmark), but it has some other issues, such as the max workgroup being 256 on the newer radeon, while it was higher on the older nvidia (not sure if this is an issue or simply an architectural difference, but for now I’m calling it an issue :-P), as well as (I’m not sure how big of a deal this is yet) double precision being supported on the older nvidia and NOT being supported (at all) on the ATI card. I’m guessing this has to do with the GTX280 being a “super” enthusiast card at the time while the 6850 wasn’t the highest, a 69XX would probably supported double precision, being the top model of the series at the time…hopefully this won’t bite me in the ass.
Anyway, onto the fix.
After fixing the workgroup issue and the rather annoying matter of converting the kernel to not use double precision values…as well as some other odds and ends where the ATI implementation seems more touchy.
Everything works PERFECTLY. ALL of these BS issues went away, I reverted back to the original statement in my kernel, it executes without issue, any combination (including the statements below) work without issue, as they should.
if (0)
if (1)
I’m very sad that this fixed it… I feel Nvidia has definitely let me down this time in terms of there implementation…but maybe something in the Nvidia world had to be dealt with differently…? Either way, this solution makes me annoyed.
Moral:
DON’T USE OPEN CL ON NVIDIA CARDS
EDIT:
The 6850 is probably NOT ideal because it does NOT support double precision via neither the cl_khr_fp64 nor the cl_amd_fp64 extensions. This was verified via the get info call to pull all extensions, so a 69XX or 79XX which will support DPFP is probably ideal. This is quite annoying :-.
6850 extensions:
cl_khr_global_int32_base_atomics
cl_khr_global_int32_extended_atomics
cl_khr_local_int32_base_atomics
cl_khr_local_int32_extended_atomics
cl_khr_3d_image_writes
cl_khr_byte_addressable_store
cl_khr_gl_sharing
cl_ext_atomic_counters_32
cl_amd_device_attribute_query
cl_amd_vec3
cl_amd_printf
cl_amd_media_ops
cl_amd_popcnt
Though at least the ATI card gives me the ability to debug via their printf extension…
I’m guessing there’s not, but is there any way to get DPFP with this card, if anybody knows? I’m trying right now to organize the code in such a way that if I bump into a machine which supports it I can easily switch a few setup variables. I can’t recall what the hell the computer in the lab uses, I know it’s a 7XXX but I forgot which. The end system will be that, I believe.
EDIT 2:
This is super annoying because the 6850 has OpenCL version 1.1, I believe, yet the OpenCL spec for the cl_khr_fp64 extension states that it should exist in anything passed version 1.0.
So in short:
- It doesn’t seem AMD is following the spec 100%
- It’s not tripping out like the Nvidia card, though.