Problem with width and reference of element in array

Hi,

In this code bellow:

__kernel void sivia( __global unsigned int *FUNCOES_POR_KERNEL )

{	
						struct _intervalo lista[1344][TAM_LISTA];

						
						for(  int i = 0; i < 1344; i++ )
						{

							lista[i][0].inferior = -4;
							lista[i][0].superior = 4;
							lista[i][1].inferior = -4;
							lista[i][1].superior = 4;

							

							printf("

T = %d [%.3f][%.3f][%.3f][%.3f] - %d", get_global_id(0),
lista[i][0].inferior,
lista[i][0].superior,
lista[i][1].inferior,
lista[i][1].superior,
0);
}

}

This code only works when my array lista is less then 134 for first dimension. If is more than 134 for first dimension, the code is send to be launch, but don’t run. The printf for instance don’t show anything. The code compile phase go well, all before run kernel’s, in host, works fine. But, when clEnqueuNDRange is called, don’t works and no errors appears. TAM_LISTA is 500.

But, if i change the i (highlighted in red ) for the constant 1343 for instance, the code works fine.

I need that this array is at least 1344 x 500.

Can anyone help please?

Thanks,

Luiz.

I downt know how your datatype is defined, but i think this will blast your memory. A memorydefinition at that point should go into shared or even pivate memory. Both are kind of small. Check this and see the spec where your variables go and how to split them. There is an errorcode for that i think, but ii dont know exactly

Hi,

At first, no errors occurs.

This data type is only two floats. I think that is very small and can be allocated in private memory. Am i wrong?

Very thanks for your help.

I forget to say that this example is running in CPU.

Indeed, i have to run in CPU, GPU is another code.

Is very strange that in CPU this errors happens.

Thanks again.

On the CPU, the private an local memory is very big, on the gpu it is much smaller. On my GTX680 local mem is 48k for all SPU’s. So it’s 4k per spu. Your array has 1344 * TAM_LISTA * 2 * 4 byte (or 10k * TAM_LISTA).
this will not fit i think.

This is not fit in CPU?

I have read some books that say that OpenCL mapping private memory in CPU in cache L1/registers. But, some tests that i made, i allocate in CPU more than maximum L1 size.

I don’t know what i think about that now. hehehe

Thanks,

If you have VS and cuda + intel cl sdk, you could have a look at the nsight system window or opencl. that tells me the following:


First one is i7 12 core, second one is a gtx670
CL_DEVICE_LOCAL_MEM_SIZE	32768	49152
CL_DEVICE_LOCAL_MEM_TYPE	Global	Local
CL_DEVICE_MAX_CLOCK_FREQUENCY	3470	1564
CL_DEVICE_MAX_COMPUTE_UNITS	12	16