I’d suggest you to rethink the structure of your application rather. No matter what you will do, such operation will be costly. Is there a way to split your operation into few calls of a simpler kernel and then agregate the results? If it is not feasible, here is the concept:
struct iterator{ __global int* current_ptr;
int argument;
int element;
};
__kernel void myKern(constant int sizes_of_buffers[MAX_NUMBER_OF_ARGS+1], int* arg0, int* arg1, int* arg2)
iterator newIterator(int* arg0){
iterator output;
output.current_ptr = arg0;
output.argument = 0;
output.element = 0;
return output;
}
iterator nextIterator(iterator old, int* sizes_of_buffers, int* arg0, int* arg1, int* arg2){
iterator output;
if (old.element < sizes_of_buffers[old.argument]){
output = old;
output.element++;
else {
output.element = 0;
ouput.argument = old.argument+1;
if (output.argument == MAX_NUMBER_OF_ARGS || sizes_of_buffers[output.argument] == 0) //Array must be null-terminated
output.current_ptr = NULL;
else switch (output.argument){
0: output.current_ptr = arg1; break;
1: output.current_ptr = arg2; break;
}
}
return output;
}
bool isNULL(iterator a){
return (a.current_ptr == NULL);
}
int* access(iterator a){
return a.current_ptr + a.element;
}
So, how this works. First, you create an iterator like so:
iterator i =newIterator(arg0);
And then you can make a loop like so:
int accum = 0;
while (!isNULL(i)){
accum += *access(i);
i = nextIterator(i, sizes_of_buffer, arg0, arg1, arg2);
}
It’s gonna be a lot of pain to use nextIterator with lots of parameters so you may want to substitute "sizes_of_buffer… " part with @ symbol. When you’re done with the kernel, use some tool (or merely “Find and replace” of your IDE) to return essential data where it belongs.
You’ll need to adjust this code to fit your needs (or to make it compile for that matter), but hopefully, the idea is clear enough. And I once again warn you, compiler will not be happy.