Passing values between variables of host and kernel Code in a loop in OpenCL

up vote
0
down vote
favorite
I am in trouble passing values between host code and kernel code due to some vector data types. The following code/explanation is just for referencing my problem, my code is much bigger and complicated. With this small example, hopefully, I will be able to explain where I am having a problem. I f anything more needed please let me know.

std::vector<vector<double>> output;

for (int i = 0;i<2; i++)
{
  auto& out = output[i];
  sum =0;
  for (int l =0;l<3;l++)
  {
   for (int j=0;j<4; j++)
   {
    if (some condition is true)
     { out[j+l] = 0.;}
    sum+= .....some addition...
   }
 out[j+l] = sum
 }
}

Now I want to parallelize this code, from the second loop. This is what I have done in host code:

cl::buffer out = (context,CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR, output.size(), &output, NULL)

Then, I have set the arguments

cl::SetKernelArg(0, out);

Then the loop,

for (int i = 0,i<2, i++)
{
  auto& out = output[i];
  // sending some more arguments(which are changing accrding to loop) for sum operations
  queue.enqueueNDRangeKernel(.......)
  queue.enqueuereadbuffer(.....,&out,...)
 }

In Kernel Code:

__kernel void sumout(__global double* out, ....)
{
  int l = get_global_id(0);
  int j = get_global_id(1);
    if (some condition is true)
     { out[j+l] = 0.;
       return}
    sum+= .....some addition... // so out[j+l]= 0 everytime it reaches here
     }
 out[j+l] = sum
}

So now, in if condition out[j+l] is getting 0 in the loop. So out value is regularly chaniging. In normal code, it is a reference pointer to a vector. I am not able to read the values in output from out during my kernel and host code. I want to read the values in output[i] for every out[j+l]. But I am confused due this buffer and vector.

just for more clarification,output is a vector of vector and out is reference vector to output vector. I need to update values in output for every change in out. Since these are vectors, I passed out as cl buffer. I hope it is clear. Please let me know, if the code is required, I will try to provide as much as I can.