OpenCL Failing At Conditionals

This doesn’t crash:

if ((z >= attributes[18] ) && (y < 1 + .05))

If I remove the first part of that…this crashes:

if ((y < 1 + .05))

So, attributes[18] is like 10 or something, z is a calculated number, as is y.

It’s returning CL_OUT_OF_RESOURCES to the NDRange run.

Contents of if-statement aren’t related to the check directly:

_cellGrid[(W * 600) - (600 - H)].value = .8;

W and H are global ID’s from the dimensions…but I replaced that line with

_cellGrid[(10 * 600) - (600 - 20)].value = .8;

Which is most definitely defined on a 360k length grid, just to be sure that the ID’s weren’t going into any weird errors, the error still persists, exactly the same.

I’d say “okay, maybe it’s a memory issue” if it was crashing when I ADDED the memory access, but it crashes when I REMOVE the memory access. This is a ridiculous problem, I have no idea where to start debugging. I tried disabling compiler optimizations which didn’t get me anywhere. Yes, cellGrid access is always defined, I’ve made sure of that. For some reason that if statement is dying terribly and I cannot figure out why.

If I change the if statement to any constants, it crashes, ie. if (1), if (0). ALL of the above works on the CPU version…Not sure what the heck could make an if statement act this instable.

Any advice?

Card is a GTX280.

Your problem seems caused by the NV driver. NV drivers presented those kind of weird behaviours.

The underlaying message is: if you want GPGPU in our cards, use CUDA

PD: Seek for my messages in this forums and you will see some other ridiculous problems (like removing a “if(0) {}” in the middle of a kernel, affected numerical results)

So in short, Nvidia’s OpenCL implementation blows and I should have been using CUDA?

EDIT:
And there’s no fix or anything? They’re just random errors…?

BTW: In case it matters any I’m on Ubuntu 12.04, driver = 295.49.

If the code is correct it should work. Last time I used NV it had some issues and ‘known bugs’, but so does AMD. NV might not focus on opencl, but remember opencl is only a thin layer atop/aside CUDA anyway, so there isn’t much to stuff up.

It might be the calculation of y which is causing the problem. Perhaps the short-circuit expression is causing it not to be calculated in some cases. Try using the non-short-circuit version perhaps (use & instead of &&). (this is a bit of a long-shot, but all i can think of)

I’m sure your description of your code is accurate but from it there’s really no way for anyone else to decipher what you’re doing to provide any more detailed advice.

I hear you. I understand the skepticism that I’m not describing the entire problem, or I’m leaving out info that I don’t think is vital.

IMO, according to the C99 spec, it shouldn’t matter how it’s calculated, I have code in there such that:

double y = ...

There is no divisions so division by 0 is not possible. That being said, I feel like y should be set to SOMETHING, in accordance with the C99 spec, even if it’s not what I want, it should be in no way causing a memory error.

Also, to circumvent Y entirely, if I change it to simply:

if (1)

It crashes.

One may say “perhaps something in the body isn’t always accessible” - which is a fair point, but the contents being accessed in the body is a hard-coded value that the cell is being set to and the cell is part of a vector of doubles being passed in.

To further back this claim, changing it to:

if (0)

also causes the code to crash here. Now there’s NO memory being accessed…

So somebody might say “what if a variable is being initialized in there.” I’ll say that’s a fair claim, but there’s only that one set, and taking any possible paths beneath it will still only lead to writes. There is NO reads from the memory location as it is only an output.

I’ll be testing an ATI card tonight (hopefully tonight). If these mystery problems go away, we can safely (probably not ‘safely’) blame Nvidia. If they persist, we can pretty safely blame me.

Well I have some fortunate / unfortunate news.

I went on Craigslist and picked up a ATI Radeon 6850 for $80, figuring that it’s newer than my GTX280 and I could get it cheap second hand I thought it was worth a shot.

The 6850 is better than the GTX280 (2,747 vs 1,971 in passmark), but it has some other issues, such as the max workgroup being 256 on the newer radeon, while it was higher on the older nvidia (not sure if this is an issue or simply an architectural difference, but for now I’m calling it an issue :-P), as well as (I’m not sure how big of a deal this is yet) double precision being supported on the older nvidia and NOT being supported (at all) on the ATI card. I’m guessing this has to do with the GTX280 being a “super” enthusiast card at the time while the 6850 wasn’t the highest, a 69XX would probably supported double precision, being the top model of the series at the time…hopefully this won’t bite me in the ass.

Anyway, onto the fix.

After fixing the workgroup issue and the rather annoying matter of converting the kernel to not use double precision values…as well as some other odds and ends where the ATI implementation seems more touchy.

Everything works PERFECTLY. ALL of these BS issues went away, I reverted back to the original statement in my kernel, it executes without issue, any combination (including the statements below) work without issue, as they should.

if (0)
if (1)

I’m very sad that this fixed it… I feel Nvidia has definitely let me down this time in terms of there implementation…but maybe something in the Nvidia world had to be dealt with differently…? Either way, this solution makes me annoyed.

Moral:
DON’T USE OPEN CL ON NVIDIA CARDS

EDIT:
The 6850 is probably NOT ideal because it does NOT support double precision via neither the cl_khr_fp64 nor the cl_amd_fp64 extensions. This was verified via the get info call to pull all extensions, so a 69XX or 79XX which will support DPFP is probably ideal. This is quite annoying :-.

6850 extensions:

cl_khr_global_int32_base_atomics
cl_khr_global_int32_extended_atomics
cl_khr_local_int32_base_atomics
cl_khr_local_int32_extended_atomics
cl_khr_3d_image_writes
cl_khr_byte_addressable_store
cl_khr_gl_sharing
cl_ext_atomic_counters_32
cl_amd_device_attribute_query
cl_amd_vec3
cl_amd_printf
cl_amd_media_ops
cl_amd_popcnt

Though at least the ATI card gives me the ability to debug via their printf extension…

I’m guessing there’s not, but is there any way to get DPFP with this card, if anybody knows? I’m trying right now to organize the code in such a way that if I bump into a machine which supports it I can easily switch a few setup variables. I can’t recall what the hell the computer in the lab uses, I know it’s a 7XXX but I forgot which. The end system will be that, I believe.

EDIT 2:
This is super annoying because the 6850 has OpenCL version 1.1, I believe, yet the OpenCL spec for the cl_khr_fp64 extension states that it should exist in anything passed version 1.0.

So in short:

  • It doesn’t seem AMD is following the spec 100%
  • It’s not tripping out like the Nvidia card, though.

6850
Device Version:

OpenCL 1.2 AMD-APP (923.1)

Driver Version:

CAL 1.4.1720

So seeing as the cl_khr_fp64 (or any DPFP for that matter) extension should exist in versions passed OpenCL 1.0, it seems AMD doesn’t follow the spec 100%, see previous post for list of 6850 extensions.

UPDATE:
So as in my previous post, I had to drop DPFP with the ATI card, as the 6850 doesn’t support it. I also had to change the number of work groups because the max WG size of my old ass Nvidia card (280 GTX) is 512 but on the newer ATI card is only 256 (not quite sure why).

So yeah, I change my WG size to 15 so it works properly with the ATI card (as my original local WG size of 20 creates 400, which is over the 256 WG max) and converted all the FP to use standard 32 bit floats, instead of 64 bit doubles.

So I moved last week (what a hassle), and in doing so I had my computer apart to dust it out and I got the idea to try the Nvidia card in Windows with OpenCL, just so I didn’t count it out too fast. So I threw the Nvidia card in, moved to my new place, after MUCH hassle got it to compile in Windows (what a pain in the ass development is in Windows), and the Nvidia card worked FINE. I even changed the local WG size to 20x20 again. I thought this meant it had to be working fine.

At this point I thought it was a problem with the Nvidia driver in Linux (ie. not a problem with the Windows version). Then I said “before I continue, let me do one last thing to equate this to the original Linux environment - so I can be sure that the problem is solved.”

So I convert everything to use DPFP again (doubles) and it dies, same exact way, runs out of resources. Now I’m about to cry at first when just for shits and ha-ha’s I kick the local WG size down to 15x15…it runs…just fine.

My theory now at the present time that I can’t use a 20x20 work group even though that’s only 400 work items (compared to the 512 max) simply because that PLUS DPFP it runs out of something (memory or registers or whatever, no idea, error isn’t specific)! So even though everything individually should have worked, together it wouldn’t.

That’s my theory right now, I’m going to go back to that same exact Linux environment with my original build and knock the WG down to 15x15 and see what happens. My theory is that it’ll work fine…I’ll come back with results!

EDIT:
This is using OpenCL 1.0 with the Nvidia 305.53 driver.

The 512 work item max is for a simple kernel. A more complex kernel may not be able to do that. You can check how many work items a given kernel can do by calling clGetKernelWorkGroupInfo with CL_KERNEL_WORK_GROUP_SIZE.

Yeah, that’s what gave me the 512 I was basing it on.

I tried lowering it to a 2x2 workgroup and I was still having issues (this was just today) and I was able to fix it by changing all my doubles to floats. Then I was able to use 15x15 WG’s again…

Think I’m staying away from doubles for the time being, they’re killing me.