Hello all,
I’m having a bit of trouble understanding what my work group size and work item sizes should be. Beyond that I’m having trouble just finding out how large these can be for the hardware I have.
The problem I’m trying to parallel can be broken down to factoring a very large number which only has two factors (other than 1 & itself). The kernel will only ever have one number at a time to factor. Does this mean that I can have a single 1D work group, or is there some benefit to making this 2D or 3D?
I have been successful in finding the max work group size via (CL_DEVICE_MAX_WORK_GROUP_SIZE). However, if I try to use that as the work group size and 1 as the number of work groups I get a CL_INVALID_WORK_GROUP_SIZE error on the enqueueNDRangeKernel(). Here is what I tried:
size_t groupSize;
devices[0].getInfo((cl_device_info)CL_DEVICE_MAX_WORK_GROUP_SIZE, &groupSize);
...
err = queue.enqueueNDRangeKernel(
kernel,
cl::NullRange,
cl::NDRange(1),
cl::NDRange(groupSize),
NULL,
&event);
The code I am writing will be run on multiple computers, so I need some way of dynamically calculating both the work group size and the number of work groups. Also, am I correct in thinking that total number of concurrently running “threads” = the work group size * the number of work items per work group? I’m not sure thread is the correct word there, but essentially the number of processes which are executing the kernel specified.
I really appreciate any help.