Cross-platform issues MacOSX / Linux

Hi everyone,

I have a simple kernel [ http://goo.gl/PBTsg ] that has been working OK on MacOSX (ATI HD4850 / nVidia GeForce whatever) for a while. I am using JOCL as a wrapper, and here is the code that invokes the kernel if anyone is curious [ http://goo.gl/qFhel ].

I decided to give it a go on Linux SUSE Enterprise 11 x64 (nVidia Tesla “Fermi” M2050) and other than a few tweaks to make the compiler happy (same tweaks applied to the version on mac and tested over there too) everything seems fine … except it’s not!

I am running the same computation (defined in the kernel) over around 300 items in parallel with the exact same initial conditions (al the input buffers are populated with the exact same values) and plotting results so that all the plots should look the exact same.

when I run on MacOSX all the plots show the wave form I expect and look the same – on Linux some of them are completely empty. The ones that are not empty are fine indeed (so it confirm the kernel is doing its job just fine), but most of them seem to be missing data altogether.

I recall something similar from when I was trying to use double precision on machines that didn’t support it, in that case only the first half of the elements were being processed (so if I had 300 only the 1st 150 were being computed) for some reason, but in this case I cannot find any obvious patterns.

I understand I cannot ask people to solve this for me with so little info, so all I am asking is for someone with a bit of experience in typical OpenCL cross-platform issues to have a look at my kernel to see if they can spot any of the usual suspects.

Any help/advice/suggestions in terms of troubleshooting appreciated!

Just a guess, do your input/output buffers need to be aligned (which is hard to control in Java)? This would mean some runs by chance are aligned and thus work, and other runs are by chance misaligned and thus not work. But this is just a guess.

Thanks – what exactly do you mean by aligned?

If you mean same exact values for all the input and output buffers then the answer is yes, that the way I expect them to be, but even so the computation is independent of that so it should still work just with slightly different results. My problem is that some of the results are coming back with default values (I am initializing the buffers), so it looks like nothing is happening at all for some of the elements.

If I do a dump of all inputs and then all outputs I should be able to see what’s what – working on that at the moment.