Using clEnqueueWriteBuffer with c++ vectors

Hello all,

I am a novice to programming in general and am attempting to add two 2D c++ vectors a specified amount of times on the GPU using OpenCL. I think I am running into issues when I attempt to write my data to the created device buffer. I am uncertain as to how I need to create my vectors in order to pass them to GPU. When running this code, I get a pointer error when I attempt to access the data that is contained within the vector that receives the data from the GPU indicating to me that the data are never actually brought back from the device.

Here I post a portion of my code. I have some functions that I wrote called packVector and unPackVector. These are simply used to make the vectors one dimensional so I can pass them to the GPU and when receiving 1D data from the GPU, make it 2D.

I have successfully implemented this code with 2D arrays but I am getting memory leaks and therefore would like to use vectors.

The GPU I am using is a NVIDIA GeForce 9400M and I am using the OpenCL framework via Mac OS X 10.7.5

Thank you in advance.


        int n1 = 2;
        int n2 = 2;
	int dims = n1*n2;
	int iters = 2;

	std::vector<std::vector<float> > h_xx(n1, std::vector<float>(n2));
	std::vector<std::vector<float> > h_yy(n1, std::vector<float>(n2));
	std::vector<std::vector<float> > h_zz(n1, std::vector<float>(n2));

	std::vector<float> h_x(dims);
	std::vector<float> h_y(dims);
	std::vector<float> h_z(dims);


	for (int i = 0; i < n1; ++i){
		for (int j = 0; j < n2; ++j){
			h_xx[i][j] = 1;
			h_yy[i][j] = 1;
		}
	}


	cl_mem d_xx, d_yy, d_zz;

    d_xx = clCreateBuffer(context, CL_MEM_READ_WRITE, sizeof(float)*dims, NULL, &err);
    if(err != CL_SUCCESS){
        std::cout << "Error: Could not create the buffer." << std::endl;
        exit(1);
    }

    d_yy = clCreateBuffer(context, CL_MEM_READ_WRITE, sizeof(float)*dims, NULL, &err);
    if(err != CL_SUCCESS){
        std::cout << "Error: Could not create the buffer." << std::endl;
        exit(1);
    }

    d_zz = clCreateBuffer(context, CL_MEM_READ_WRITE, sizeof(float)*dims, NULL, &err);
    if(err != CL_SUCCESS){
        std::cout << "Error: Could not create the buffer." << std::endl;
        exit(1);
    }

    arr.packVector(h_xx, h_x);
    arr.packVector(h_yy, h_y);

    clEnqueueWriteBuffer(queue, d_xx, CL_FALSE, 0, sizeof(float)*h_x.size(), &h_x, 0, NULL, NULL);
    if(err != CL_SUCCESS){
        std::cout << "Error: Could not write vector to buffer." << std::endl;
        std::cout << "OpenCL error code: " << std::endl;
        exit(1);
    }

    clEnqueueWriteBuffer(queue, d_yy, CL_FALSE, 0, sizeof(float)*h_y.size(), &h_y, 0, NULL, NULL);
    if(err != CL_SUCCESS){
        std::cout << "Error: Could not write vector to buffer." << std::endl;
        std::cout << "OpenCL error code: " << std::endl;
        exit(1);
    }
    

    err = clSetKernelArg(kernel, 0, sizeof(cl_mem), &d_xx);
    if(err != CL_SUCCESS){
        std::cout << "Error: Could not set the kernel argument." << std::endl;
		std::cout << "OpenCL error code: " << err << std::endl;
        exit(1);
    }
    err = clSetKernelArg(kernel, 1, sizeof(cl_mem), &d_yy);
    if(err != CL_SUCCESS){
        std::cout << "Error: Could not set the kernel argument." << std::endl;
		std::cout << "OpenCL error code: " << err << std::endl;
        exit(1);
    }
    err = clSetKernelArg(kernel, 2, sizeof(cl_mem), &d_zz);
    if(err != CL_SUCCESS){
        std::cout << "Error: Could not set the kernel argument." << std::endl;
		std::cout << "OpenCL error code: " << err << std::endl;
        exit(1);
    }
	
    err = clSetKernelArg(kernel, 3, sizeof(iters), &iters);
    if(err != CL_SUCCESS){
        std::cout << "Error: Could not set the integer kernel argument." << std::endl;
        std::cout << "OpenCL error code: " << err << std::endl;
        exit(1);
    }

    size_t work_units_per_kernel = dims;
    err = clEnqueueNDRangeKernel(queue, kernel, 1, NULL, &work_units_per_kernel, NULL, 0, NULL, NULL);
    if(err != CL_SUCCESS){
        std::cout << "Error: Could not execute the kernel." << std::endl;
        exit(1);
    }

    err = clEnqueueReadBuffer(queue, d_zz, CL_TRUE, 0, sizeof(float)*h_z.size(), &h_z, 0, NULL, NULL);
    if(err != CL_SUCCESS){
        std::cout << "Error: Could not read vector from the kernel." << std::endl;
        std::cout << "OpenCL error code: " << std::endl;
        exit(1);
    }

   arr.unPackVector(h_z, h_zz);

    arr.printVec2D(h_zz);

are you shure that the reference to the std::vector will give you a data array of floats? this would mean that &h_x is the same as a float[dims]

Thank you clint3112 for your reply.

No, I am not sure. I believe that is why my program does not work. How then should I pass the vector of floats to the function clEnqueueWriteBuffer?

&h_z.at(0) will give you a valid pointer on the first element of h_z.

Thank you utnapishtim for your reply as well.

Is that the only way I can reference the data within my vector as a valid pointer? I think that means that I might have to loop over all of the elements of my vector in order to get my data to the GPU. This I would rather not do as it appears that it might be rather slow versus copying an entire array. Is there any other way that I might copy the data from my vector to the device buffer?

Thanks again for all of the help.

&h_z.at(0) is the same as float foo[n]; &foo[0]; So yiu dont need to copy anything. Just have to pass that adress to the enque call.

But do I not need to pass the address of each element of the vector to the enqueue call? For example, if my h_xx and h_yy are 3x3 matrices filled with ones, when I pass just the &h_x.at(0) and h_y.at(0) the resulting h_zz is a matrix with the first row filled with twos and the rest of the matrix zeros. I guess I still don’t understand how to copy all of the contents of the vector with one enqueue call.

Memory is linear, so an N-dimensional vector is still linear in memory. So pass the address of the first element to clEnqueueWriteBuffer, as suggested above.

Just pass the start adress and for the 3x3 matrix 9*sizeof(float) as size of your data.

You can also use std::vector::data() if using MSVC 2010 or later, or clang/gcc with -std=c++11. This returns the address of the first element in the internal array. For example:

clEnqueueWriteBuffer(queue, d_xx, CL_FALSE, 0, sizeof(float)*h_x.size(), h_x.data(), 0, NULL, NULL);