How do I pass 7 arrays to device for kernal processing?

Warning: I’m an opencl newbie and an intermediate C++ programmer. Feel free to assume that I need to be spoken to like a child.

I’m writing an opencl program to look at a set of atoms (with x,y,z coordinates) and a set of rays. In C++ I made 2 for-loops. So for every ray, I looked at every atom to see if it touches the ray and if so get the distance and save the smallest distance. I’ve tried to convert it to opencl, but I’m certain I’ve missed some important points.

Previous version:
for(i=0;i<RAY_ARRAY_SIZE;i++)
{
smallestDistance[i] = 999.;
for(j=0;j<ATOMS_ARRAY_SIZE;j++)
{
if(ray i touches atom j)
a = distance to atom;
if(a<smallestDistance[i])
smallestDistance[i] = a;
}
}

So, I’m assuming I can do this in opencl by using 2 dimensions.

One dimension has rays indexed by i. Second dimension has atoms indexed by j.

I’ll make arrays for rays:
cl_mem double phi[RAY_ARRAY_SIZE], psi[RAY_ARRAY_SIZE], and rho[RAY_ARRAY_SIZE];

and arrays for atoms:
cl_mem double atomX[ATOMS_ARRAY_SIZE], atomY[ATOMS_ARRAY_SIZE], atomZ[ATOMS_ARRAY_SIZE], atomRadii[ATOMS_ARRAY_SIZE];

define kernal arguments:
errNum = clSetKernelArg(kernel, 0, RAY_ARRAY_SIZE*sizeof(double), phi);
errNum |= clSetKernelArg(kernel, 1, RAY_ARRAY_SIZE
sizeof(double), psi);
errNum |= clSetKernelArg(kernel, 2, RAY_ARRAY_SIZE
sizeof(double), rho);
errNum |= clSetKernelArg(kernel, 3, ATOMS_ARRAY_SIZE
sizeof(double), atomX);
errNum |= clSetKernelArg(kernel, 4, ATOMS_ARRAY_SIZE
sizeof(double), atomY);
errNum |= clSetKernelArg(kernel, 5, ATOMS_ARRAY_SIZE
sizeof(double), atomZ);
errNum |= clSetKernelArg(kernel, 6, ATOMS_ARRAY_SIZE
sizeof(double), *atomRadii);
if (errNum != CL_SUCCESS)
{
cerr << “Error setting kernel arguments.” << endl;
return 1;
}

define memory passed to & from gpu:
err = clEnqueueWriteBuffer(queue, phi, CL_TRUE, 0, RAY_ARRAY_SIZEsizeof(double), phi, 0, NULL, NULL);
err |= clEnqueueWriteBuffer(queue, psi, CL_TRUE, 0, RAY_ARRAY_SIZE
sizeof(double), psi, 0, NULL, NULL);
err = clEnqueueWriteBuffer(queue, rho, CL_TRUE, 0, RAY_ARRAY_SIZEsizeof(double), rho, 0, NULL, NULL);
err = clEnqueueWriteBuffer(queue, atomX, CL_TRUE, 0, ATOMS_ARRAY_SIZE
sizeof(double), atomX, 0, NULL, NULL);
err = clEnqueueWriteBuffer(queue, atomY, CL_TRUE, 0, ATOMS_ARRAY_SIZEsizeof(double), atomY, 0, NULL, NULL);
err = clEnqueueWriteBuffer(queue, atomZ, CL_TRUE, 0, ATOMS_ARRAY_SIZE
sizeof(double), atomZ, 0, NULL, NULL);
err = clEnqueueWriteBuffer(queue, atomRadii, CL_TRUE, 0, ATOMS_ARRAY_SIZE*sizeof(double), atomRadii, 0, NULL, NULL);
if (err != CL_SUCCESS)
{
cerr << “Error passing gpu memory.” << endl;
return 1;
}

Is this the appropriate way to do this? Any advice would be appreciated. I know I’m doing clEnqueueWriteBuffer wrong because buffer and pointer should be different things, but I wasn’t sure which is which.

Looking at your algorithm, I recommend a one-dimensional work range. This would probably be along your RAY_ARRAY. The reason is that you would be doing a global ‘reduce’ operation when saying “a<smallestDistance[i]” if you attempted to implement this as a 2D work range where you have a work item for every i,j pair. This would take some global synchronization that is not supported in OpenCL – not that it would be efficient even if it was.

Instead, each ray work item could check every atom, retaining the inner for loop in you OpenCL kernel.