Host overhead on data transfer?
Currently I'm using CPU-GPU computation using OpenCL to lowering down CPU overhead by giving some computations to GPU.
There are several GPU blocks in CPU routines and data transfer is necessary. For faster transfer, I use CL_MEM_ALLOC_HOST_PTR and map to host-device with pointer.
But then, when data transfer occurs, CPU load is getting really high (I checked this by using 'top' command in linux). Actually, the increased overhead is almost same as using memcpy() to copy the same amount of data in CPU.
Is there any way to minimize host(CPU) usage in data transfer? Or, is this an inevitable cost in this environment?
Last edited by hchoi1239; 08-06-2013 at 09:31 PM.
You didn't mention which implementation you're using (AMD, Intel or NVIDIA).
Try using CL_USE_HOST_PTR with a buffer allocated by the application - and have this buffer pinned/locked before the map (using mlock() or any other API).
Thank you for the response! Should be worth to try your idea if vendor support the flag.. Btw, I'm trying in embedded system.
Found out that the problem was originated from the driver configuration; CPU caching was set off and CL_MEM_ALLOC_HOST_PTR didn't work properly:P