Fast array initialization

Hi,

I want to allocate an array in GPU memory initialized (all the positions) with a given value (for instance 0).

I tried by creating a cpu array pointer with zeros and create the buffer with the CL_MEM_COPY_HOST_PTR flag.

I also tried by running a kernel with one thread per position of the array an inside the kernel just assign the given value.

The array is huge, by the way.

I wonder if there is some way more efficient of doing this.

Thank you.

Someone can correct me if I’m wrong, but you could allocate them cl_mem buffer on the device using a NULL host pointer, then run a kernel to zero all the array elements before running your desired kernel. That way you don’t have to allocate and zero the array on the host and then do a host to device memory transfer, which all together could take a lot of time.

That’s what I tried to explain (wrong apparently) when I said:

Thanks anyway.

Ah, I’m sorry I misunderstood.

I guess another option (other than clEnqueueWriteBuffer) is to use clMapBuffer to write to the device memory through a host pointer. There are some examples of how to do this in the AMD and NVIDIA SDKs under memory optimizations/bandwidth (PCIe). If you do writing async then you can overlap some other computations while writing to the buffer.

Nice one!! I’ll try it.

Thank you very much.