Hi,
I want to allocate an array in GPU memory initialized (all the positions) with a given value (for instance 0).
I tried by creating a cpu array pointer with zeros and create the buffer with the CL_MEM_COPY_HOST_PTR flag.
I also tried by running a kernel with one thread per position of the array an inside the kernel just assign the given value.
The array is huge, by the way.
I wonder if there is some way more efficient of doing this.
Thank you.
Someone can correct me if I’m wrong, but you could allocate them cl_mem buffer on the device using a NULL host pointer, then run a kernel to zero all the array elements before running your desired kernel. That way you don’t have to allocate and zero the array on the host and then do a host to device memory transfer, which all together could take a lot of time.
That’s what I tried to explain (wrong apparently) when I said:
Thanks anyway.
Ah, I’m sorry I misunderstood.
I guess another option (other than clEnqueueWriteBuffer) is to use clMapBuffer to write to the device memory through a host pointer. There are some examples of how to do this in the AMD and NVIDIA SDKs under memory optimizations/bandwidth (PCIe). If you do writing async then you can overlap some other computations while writing to the buffer.