Results 1 to 2 of 2

Thread: OpenCL Bandwidth Testkernel

  1. #1
    Junior Member
    Join Date
    Nov 2011

    OpenCL Bandwidth Testkernel


    At the moment i am trying to measure the bandwidth of global read/write operations of
    my gpu. I use this kernel:

    Code :
    __kernel void bandwidth(__global float *idata,
    				     __global float *odata,
    						int offset)
    	int xid = get_global_id(0) + offset;
    	odata[xid] = idata[xid];
    I just copy the buffer idata to odata, one element per workitem. Offset is normally set to zero, it is just there to observe the impact of uncoalesced memory access.

    My graphic card, a Nvidia Geforce GT 330, has a theoretical bandwidth of 24GB/s. But with my little test kernel i just get a maximal bandwidth of around 1.3GB/s. I calculate the bandwidth with the formula in the opencl bandwidth manual ((br + bw * 2 *datasize) / 1024^3 ) / seconds. Br and bw is the number of global memory reads/writes. The time is measured by using the gpu timer (cl_event with profiling in queue enabled). The input have the size of 400 mb (array with 104857600 floats). I use with 104857600 workitems and 256 workgroups. Here is my host code:

    Code :
    //Set up context, queue and compile program
    ipf::parser::cl::CLContext context;
       std::cout<<"CommandQueue Failed"<<std::endl;
    //Compiling program and creating kernel
    std::string path =
       ipf::util::config.get<std::string>("ipf.shader.path") + "/";
    path += "";
    ipf::parser::cl::CLProgram prog = context.createProgram(path);
    ipf::parser::cl::CLKernel bw = prog.createKernel("bandwidth");
    //Set workitems and groups
    int WGX = 256;
    int elements = 10240*10240;
    int datasize = sizeof(float) * elements;
    std::vector<size_t> local;
    std::vector<size_t> global;
    //Create buffers on host and device
    std::vector<float> a(elements,1.0f);
    std::vector<float> b(elements,0.0f);
    ipf::parser::cl::CLBuffer a_dev =
    ipf::parser::cl::CLBuffer b_dev =
    //Copy buffer on device (blocking mode)
    //Set kernel args and run kernel
    bw.setArg(0, a_dev);
    bw.setArg(1, b_dev);
    bw.setArg(2, 0);
    ipf::parser::cl::CLEvent r =;	
    //Wait for finish
    //Read data back[0],datasize);
    //Caluclate bandwidth and prin out results	
    cl_ulong start = r.runTime();
    cl_ulong end = r.finishTime();
    double mili = ((double)(end) - (double)(start))* 10e-6;
    double second= ((double)(end) - (double)(start))* 10e-9;
    cl_ulong result_nano = end - start;
    std::cout<<"Milisekunden: "<<mili<<std::endl;
    std::cout<<"Sekunden    : "<<seconds<<std::endl;
    std::cout<<"Data (in MB): "<<(elements*4*2) / 1024 / 1024 <<std::endl;
    std::cout<<"Bandwidth   : "<<ipf::parser::cl::bandWidth(elements,r)<<
    std::cout<<"***********END RESULT**********"<<std::endl;

    Am i doing something wrong or is my graphic card just damm slow?

  2. #2
    Junior Member
    Join Date
    Nov 2011

    Re: OpenCL Bandwidth Testkernel

    Epic fail:

    Code :
    double second= ((double)(end) - (double)(start))* 10e-9;

    must be:

    Code :
    double second= ((double)(end) - (double)(start))* 1e-9;

    Damm it!

Similar Threads

  1. Replies: 2
    Last Post: 02-09-2012, 05:45 PM
  2. bandwidth test
    By matrem in forum OpenCL
    Replies: 5
    Last Post: 01-04-2010, 12:03 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
Proudly hosted by Digital Ocean