Send same data to multiples kernel

I have multiple kernels,in the first of them i send some entries, the output I have from the first kernel is the input for the next. My queue of kernels repeat this behavior 8 times until the last kernel that sends me the real output what I need.

This is an example of what i did:


cl::Kernel kernel1 = cl::Kernel(OPETCL::program, "forward");

//agrego los argumetnos del kernel
kernel1.setArg(0, cl_rowCol);
kernel1.setArg(1, cl_data);
kernel1.setArg(2, cl_x);
kernel1.setArg(3, cl_b);
kernel1.setArg(4, sizeof(int), &largo);

//ejecuto el kernel
OPETCL::queue.enqueueNDRangeKernel(kernel1, cl::NullRange, global, local, NULL, &profilingApp);


/********************************/
/** ejecuto las simetrias de X **/
/********************************/
cl::Kernel kernel2 = cl::Kernel(OPETCL::program, "forward_symmX");


//agrego los argumetnos del kernel
kernel2.setArg(0, cl_rowCol);
kernel2.setArg(1, cl_data);
kernel2.setArg(2, cl_x);
kernel2.setArg(3, cl_b);
kernel2.setArg(4, cl_symmLOR_X);
kernel2.setArg(5, cl_symm_Xpixel);
kernel2.setArg(6, sizeof(int), &largo);

//ejecuto el kernel
OPETCL::queue.enqueueNDRangeKernel(kernel2, cl::NullRange, global, local, NULL, &profilingApp);

OPETCL::queue.finish();

OPETCL::queue.enqueueReadBuffer(cl_b, CL_TRUE, 0, sizeof(float) * lors, b, NULL, NULL);

In this case cl_b is the output what i need.

My question is if the arguments i send to kernels are the same to all kernel, but only one is different.

Is correct what i did to set arguments??
The arguments are keeping in the device during the all kernels execution??

If I understood you correctly, yes, you can alter arguments of the kernel right after you dispatch kernel for execution. Here is how I implemented reduction, for an instanse.

    do {
      int group_count = cur_sz / WORK_GROUP_SZ + ((cur_sz % WORK_GROUP_SZ) ? 1 : 0);
      size_t thr_count = group_count * WORK_GROUP_SZ;

      clEnqueueNDRangeKernel(queue, argmin_reduce_kernel, 1, NULL, &thr_count, &WORK_GROUP_SZ, 0, NULL, NULL);

      cur_sz = group_count;
      swap(reduce2_bfs[0], reduce2_bfs[1]);
      clSetKernelArg(argmin_reduce_kernel, 0, sizeof(cl_mem), reduce2_bfs);
      clSetKernelArg(argmin_reduce_kernel, 1, sizeof(cl_mem), reduce2_bfs + 1);
      clSetKernelArg(argmin_reduce_kernel, 2, sizeof(size_t), &cur_sz);
    } while (cur_sz > 1);