I cant quite figure out the best way to sum up all the elements of buffer.
I have tried the following kernels but neither sums up the buffers properly.
__kernel void vector_sum(__global int *A,__global int *sum) {
// Get the index of the current element to be processed
int i = get_global_id(0);
// Do the operation
sum[0] = A[i] + sum[0];
}
__kernel void vector_sum(__global int *A,int sum) {
// Get the index of the current element to be processed
int i = get_global_id(0);
// Do the operation
sum = A[i] + sum;
}
Does anyone have an idea for the best way to do this?
Does anyone have an idea for the best way to do this?
This is not surprisingly, a very common problem on parallel architectures. One that is well studied and basically ‘solved’.
Do a search on ‘parallel reduction’, or ‘parallel prefix sum’: you will find dozens of hits for code and algorithms which should point you in the right direction. The sdk’s of every vendor include examples as well.
Your proposed solutions indicate you’re not understanding the basic parallel execution programming model either. You need to acquaint yourself with that if you have any hope of getting anywhere with opencl. The opencl specification and all of the vendor introductory material are good guides.