Results 1 to 4 of 4

Thread: clEnqueueReadBuffer() takes msec to complete???

  1. #1
    Junior Member
    Join Date
    Jun 2009

    clEnqueueReadBuffer() takes msec to complete???


    I am using CPU SDK on a comp that has 16GB of RAM, running OpenSUSE. I did some profiling on command-queue and either I messed up something huge, or this is less than acceptable performance. In summary, I am measuring how long it takes to read buffer back after kernel executes for int buf[16] (just 16 ints!!!). Here's what I got:

    Code :
            //HOST SIDE (nthreads = 16)
    	d_calc2_res = clCreateBuffer(context,
    				&err );
    	checkResult((err == CL_SUCCESS), "clCreateBuffer failed\n");
            //Pass the pointer to the kernel
    	err = clSetKernelArg(calc2_kernel, 10, sizeof(cl_mem), static_cast<void *>(&d_calc2_res));
    	checkResult((err == CL_SUCCESS), "clSetKernelArg failed\n");
            //Fill it up with values in kernel (verified correct kernel execution)
            //Read the result back
        	err = clEnqueueReadBuffer(cmdQueue, d_calc2_res, CL_TRUE,
                                  0, nthreads*sizeof(int),
                                  static_cast<void *>(calc2_res),
                                  0, NULL, &eventh);
        	clWaitForEvents(1, &eventh);
            //Read profiling info
    	printf("\n\tRead buffer time for submit (1 pass):\t%f msec\n\n",(tend-tstart)/1000000.0);
    	printf("\n\tRead buffer time for execute (1 pass):\t%f msec\n\n",(tend-tstart)/1000000.0);	
        	checkResult((err == CL_SUCCESS), "clEnqueueReadBuffer failed\n");

    The timer has nanosecond resolution and it's pretty close to my accurate timer I used before OpenCL, both confirm about the same numbers:

    Read buffer time for submit (1 pass): 0.007054 msec
    Read buffer time for execute (1 pass): 0.298222 msec

    So, .3 msec to copy 16 ints???? I tried using blocking/non-blocking option, same thing. Is this to be expected and if so, what workarounds do we have to get decent performance?

  2. #2
    Senior Member
    Join Date
    Jul 2009
    Northern Europe

    Re: clEnqueueReadBuffer() takes msec to complete???

    Performance is completely dependent on the implementation. You'll have to talk to whomever provided you with the SDK about the particulars.

    With that said, when you read back data with clEnqueueReadBuffer, there is the overhead of enqueueing the read and executing it. You will never get good performance with small chunks of data as this overhead will swamp the transfer. Try transferring 8-64MB and see what performance you get.

  3. #3
    Junior Member
    Join Date
    Jun 2009

    Re: clEnqueueReadBuffer() takes msec to complete???

    Am I the only one who thinks this is extremely slow? On the same note, I understand many of these performance numbers depend on the specific vendor implementations, but can we ask for certain base metrics to be part of the inner (core) requirement for certification?

    The premise of OpenCL is platform independence, GIVEN, that they perform on par with some expectations; have any of these expectations been set for all qualifying (certified) vendors? Can anybody post vendor comparisons on key metrics?

  4. #4
    Senior Member
    Join Date
    Sep 2002
    Santa Clara

    Re: clEnqueueReadBuffer() takes msec to complete???

    The OpenCL conformance tests are used to verify compliance of an implementation. However, I'm not sure that we can require performance expectations that implementations must meet or exceed as part of compliance.

    IMO the best way to solve this is to work with the vendor via vendor forum or directly to get these kinds of issues resolved.

Similar Threads

  1. Replies: 3
    Last Post: 01-25-2013, 06:16 AM
  2. Replies: 3
    Last Post: 05-13-2008, 10:25 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
Proudly hosted by Digital Ocean