Init Buffer Problem

I create a buffer on the GPU with

reocl->jointHist = clCreateBuffer(Ocl._GPUContext, CL_MEM_READ_WRITE, HISTOGRAMM_BIN_SIZE*HISTOGRAMM_BIN_SIZE*sizeof(cl_uint), NULL, &error);

Then I use a kernel to initalize the memory of this buffer with zero:

error |= clSetKernelArg( Ocl._pGPUKernels[K_INIT_BUFFER], GPU_JHIST_RESULT, sizeof ( cl_mem ), &(reocl->jointHist));
	error |= clEnqueueNDRangeKernel(Ocl._GPUCommandQueue, Ocl._pGPUKernels[K_INIT_BUFFER], 2, NULL, globalWorkSizeInit, NULL, 0, 0, 0);

Kernel Code:

__kernel 
void initBuffer2 ( __global uint *histogram
					)
{
	histogram[get_global_id(0)* get_global_size(1)+ get_global_id(1)] = (uint)0;
}

I allready tried this onedimensional, but it didn’t worked. After the kernel was executed, I get the memory with

clFinish(Ocl._GPUCommandQueue);
	cl_uint *jhist1 = (cl_uint*) calloc( HISTOGRAMM_BIN_SIZE*HISTOGRAMM_BIN_SIZE, sizeof(cl_uint));
	error |= clEnqueueReadBuffer(Ocl._GPUCommandQueue, reocl->jointHist, CL_TRUE, 0, HISTOGRAMM_BIN_SIZE*HISTOGRAMM_BIN_SIZE*sizeof(cl_uint), jhist1 ,0,0,0);

The memory is not at all zero. What is my failure here? Is there something I forget? I’m stuck the hole day now on this problem and I don’t know how to resolve it. Any suggestions?

How is globalWorkSizeInit declared and initialized?

The 1D case should work fine and is a little simpler.

just curious, is it faster to do a “zero-ing” kernel or to do an enqueuewritebuffer with values of zero?

That a good question. Don’t know it in the moment. But I read everywhere to avoid copy much from the host to the gpu and vice versa, so I thougt it would be the best to allocate and initialize the memory, which is only used on the GPU exactly there.

I the moment I just allocate memory on the CPU and copy it to the GPU and this works fine. I have to come forward (I’m developing mututal information on the GPU for my master thesis and I run out of time :wink: ).

The globalWorkSizeInit looks like:

const size_t globalWorkSizeInit[2] = {HISTOGRAMM_BIN_SIZE,HISTOGRAMM_BIN_SIZE};

HISTOGRAMM_BIN_SIZE is define somewhere else. The size is 256.

Thanks you for you replies.