I'm currently considering to write an application that would use OpenCL to offload atom distance calculations to the GPU (the specific situation being Monte Carlo simulations of protein folding), and I have a couple of questions in this regard.

To describe the problem briefly: I have, in the host memory, a float array that stores atom coordinates (with 3 * N elements) and another float array of N * N elements that stores the distances between the atom pairs.

Ideally, I'd like to write an OpenCL kernel that takes the coordinates array as input and writes the pair-wise distances into the output distances array. Since these arrays will be used in other parts of the host code (without assuming OpenCL acceleration), I'd like to map them onto OpenCL memory objects on the device memory in order to avoid doing potentially expensive copies from host to device memory and vice-versa.

After reading the latest OpenCL reference, it seems to me that there are, at least, two ways of achieving this:

1) creating the buffer objects passing CL_MEM_USE_HOST_PTR to clCreateBuffer(), as well as the pointers to the arrays in the host memory. This should use the memory referenced by the host pointers as the storage bits for the memory objects.

2) creating the buffers in device memory, and then mapping them to address space on host memory with clEnqueueMapBuffer(). In this case, the coordinates and distance pointers defined on the host side should store the pointers returned by the calls to clEnqueueMapBuffer().

First of all: am I correct at all in arriving to these two conclusions? If so, which method should be preferred?

Many thanks!