Following kernel:
__kernel void Euler
(__global float * field1,
const int iteration)
{ ... }
I call it in a Loop and every time a different kernel Argument is set…
for(int i = 0; i < 1000; i++) {
clSetKernelArg(1, i); // set argument iteration to current iteration
clEnqueueNDRange();
clFinish();
}
Can somebody explain, why it is faster to set an integer Argument (constant) than set
a memory object (global)? I think it has to do with the architecture of the graphic card
and the different memorys (global, const) or the communication bus from host to device (PCIe, DMA).
Thanks