Memory Access

Following kernel:


__kernel void Euler
(__global float * field1,
 const int iteration)
{ ... }

I call it in a Loop and every time a different kernel Argument is set…


for(int i = 0; i < 1000; i++) {
   clSetKernelArg(1, i); // set argument iteration to current iteration
   clEnqueueNDRange();
   clFinish();
}

Can somebody explain, why it is faster to set an integer Argument (constant) than set
a memory object (global)? I think it has to do with the architecture of the graphic card
and the different memorys (global, const) or the communication bus from host to device (PCIe, DMA).

Thanks

Memory objects need to be processed because the object handle on the host side has to be translated into a device pointer on the kernel side. Furthermore, the memory object’s contents may need to be transferred across the PCIe bus (and possibly back).

A value argument (like an int) is just copied into a buffer and sent as a collection of bytes to the device.