Failed Buffer Access

Hi guys…
I pretty new to opencl but to normal programming. Unfortunately it´s very hard to detect errors in opencl kernels - therefore I would like for some advices. Theoretically it should be pretty easy but it doesnt seems so. The problem only appears when using a NVidia device but not when I use a CPU device: Here my problem:

Kernel:
typedef struct s_Point
{
float m_X;
float m_Y;
}
Point;

typedef struct s_PointTriple
{
Point m_PointOne;
Point m_PointTwo;
Point m_PointThree;
}
PointTriple;

//////////////////////////////////////////////////////////////////////////////
// Kernel to update the angular frequency and the height of the waves
///////////////////////////////////////////////////////////////////////////////
__kernel void test(__global GridPoint* in_BufferOne, __global PointTriple* in_BufferTwo)
{
__private Point l_Point;
unsigned int l_X, l_Y, l_Index;

l_X = get_global_id(0);
l_Y = get_global_id(1);
    l_Index = get_global_size(0)*l_Y + l_X;

    // calc something for  l_Point use buffer one
    l_Point.m_X = in_BufferOne[l_Index].m_X * 0.1;

// save result	
in_BufferTwo[l_Index].PointOne.m_X = l_Point.m_X;
in_BufferTwo[l_Index].PointOne.m_Y = l_Point.m_Y;
in_BufferTwo[l_Index].PointTwo.m_X = l_Point.m_X;
in_BufferTwo[l_Index].PointTwo.m_Y =  l_Point.m_Y;
in_BufferTwo[l_Index].PointThree.m_X = l_Point.m_X;
in_BufferTwo[l_Index].PointThree.m_Y = l_Point.m_Y;

}

The localsize is settet automatically and my global dimensions are 512x512. The bufferSizes fits my calculation (6291456 bytes) and the buffers are valid. Any suggestions what could be wrong?!?!

I made some experiments - writing to one buffer pos always:
in_BufferTwo[0] = 1.0; -> works (both kind of devices)
in_BufferTwo[162144] = 1.0; -> works (both kind of devices)
in_BufferTwo[262144] = 1.0; -> fails when using the NVidia device | works when using Intel CPU device

I’ve only been in OpenCL a couple of months but I’ve made a lot of progress, so I’ll try just asking a question or two here to see if I can help:

It looks like sizeof(PointTriple) is 24. 24 * 262144 is 6291456, or exactly 6mb (610241024). You don’t say what nVidia device you have, but the 330M on my macbook pro has just 512kb. Are you sure that your buffer fits on your device?

Or, if your buffer is allocated at exactly 6mb in size [you say bufferSizes is 6291456], and you try writing a float at that address (6291456), you’re writing off the end of the array in that single operation anyway.

What kind of failure or error message are you getting?

Have you made sure that sizeof(Point) and sizeof(PointTriple) is what you expected? Different compilers may produce structs of different sizes. The buffer may need to be larger than 6MB.

@david.garcia:
Yeah I´ve checked the sizes more than once: sizeoff(PointTriple) = 24 bytes | sizeff(Point) = 8 bytes. I´ve double checked it with the CPU and GPU version.

@Photovore:
Yeah it are exactly 6 MB. And your right I´ve have mention what kind of NVidia device I have - I have a gtx 470… 6 MB shouldn´t be the problem.

clFinish() fails when I try to make access to the memory. The error code claims that the Queue is invalid - errorcode -36.

just to reiterate … you say above, in your first message:

“The bufferSizes fits my calculation (6291456 bytes)”

That is not big enough, it seems, for what you say later.

IF your buffer size is 6291456 bytes, then, since it is zero-based, element number 6291456 (which you try to write to, unsuccessfuly) is already one byte off the end of your buffer. Your buffer, if the size is what you say, has, within it, addressable bytes which number from 0 to 6291455. If you write to address [buffer + 6291456], you are already writing outside of the allocated array bounds, even if you’re only writing one byte, and I would hope (!) that you would get an error message. (I thought you were writing a float, where there were already no bytes left to write into, per the allocation – so, that would be four bytes beyond the end of your array.)

Maybe I have misunderstood you. But, if I have not, then I suggest that you make extra-sure that your memory allocations are sufficient! Cheers!

p.s. aaaaaah, error number -36; such fond memories (well, I haven’t solved mine yet!)

Photovore thanks a lot… you gave me the hint. The calculation of the index to access was not correct… Using step by step debugging and exclude some stupid things made the deal. Low level debugging!!! Hope you find a solution for your prob soon.
Thanks a lot…