local declaration inside the kernel

What happens when you declare a variable local inside the kernel itself. I have some code that works, but i want to make it faster. So, I want to copy the global input array into a local tempArray. I can create the variable, but as soon as I write to it, the kernel fails during runtime, and I cant get errors. more specifically:


__kernel void simple(
	global const float* input1,
	global float* output,
	constant float* input2,
	private int numData,
	private int numData2)
{
	int index = get_global_id(0);

	local float tempBuffer[90000];
//as soon as I add this next line in, whether as async_work_group_copy, or using index,
//it fails and returns wonkey values.

	tempBuffer[index] = index; ----------------------------broken

write_mem_fence(CLK_GLOBAL_MEM_FENCE); // so we can use later

i have it outputting an array of values that tells me whether or not it worked, and after that line it doesnt. How can I fix this?

I even tried changing the output array to a local variable, and it spit out random values, or the values that had been last run, but not the ones from the current run

Hey gamingdrake,

when you write to local memory, the barrier

write_mem_fence(CLK_GLOBAL_MEM_FENCE); // so we can use later

does not work. The CLK_GLOBAL_MEM_FENCE flag synchronizes global memory accesses. You have to use CLK_LOCAL_MEM_FENCE.

i have it outputting an array of values that tells me whether or not it worked, and after that line it doesnt.

I don’t understand you here.

However, what I think you mean, is, that you try to download the tempBuffer to the host. However, the host cannot access local memory of the GPU. You first have to write back the content of tempBuffer to global memory. Then the host can access the data.

I hope that helped

Thank you for your answer.
When I say:

i have it outputting an array of values that tells me whether or not it worked, and after that line it doesnt.

it means that I have output coming back from the GPU. In the first case, before it breaks, I tell it to output 32 to every value in “global float* output”. After it breaks, I change it to output 77777 into every value, and it still outputs 32.

As for the CLK_GLOBAL_MEM_FENCE, I have tried using LOCAL, but it does the same thing. But because my output is a global value, I should still be able to write to it and access the values, whether or not the fence works. It seems that local anything kills my kernel, unless its of size 10x10 or something small.

Hey Gamingdrake,

the host cannot access local memory. Therefore, you cannot use a local memory array to up- and download data.

See your other topic (viewtopic.php?f=28&t=4110) for a little more details.