Hello,
Im in the need to create a software that is able to move a continuous stream of data into the GPU for processing then fetch back the data (or what’s left of it). So it forwards the stream after processing.
Is there a way to do these transfers (never mind the computation now) without seriously increasing data latency? I need this to be done under 5 ms. (1ms preferred).
What transfer rates can be achived between host>GPU?
Which transfer method should I choose for this kind of transferes?
(Selected platfrom is AMD, however, NVidia is an option. Currently I can only test on nvidia.)
I found a bandwidth testing app (Nvidia OpenCL examples), that is configured to use pinned memory with mapped access and is blazing fast. However, Im still dont understand when the actual transfer happens:
// MAPPED: mapped pointers to device buffer for conventional pointer access
void* dm_idata = clEnqueueMapBuffer(cqCommandQueue, cmDevData, CL_TRUE, CL_MAP_WRITE, 0, memSize, 0, NULL, NULL, &ciErrNum);
oclCheckError(ciErrNum, CL_SUCCESS);
for(unsigned int i = 0; i < MEMCOPY_ITERATIONS; i++)
{
memcpy(h_data, dm_idata, memSize);
}
ciErrNum = clEnqueueUnmapMemObject(cqCommandQueue, cmDevData, dm_idata, 0, NULL, NULL);
oclCheckError(ciErrNum, CL_SUCCESS);
This uses MEMCOPY_ITERATIONS to test the BW.
Does memcpy actually moving data between the host and GPU? (So it really does MEMCOPY_ITERATIONS transfer of h_data to GPU and it could be processed there?)
Or does the transfer happen when we unmap the mem object?
I can keep the queues opened and the memory allocated|mapped and keep reusing it, right?
Sorry for the noob questions, Im in the dark yet.
I really appreciate your help
Thank you in advance!
Bests,
Semirke