Array initialization failed

Dear experts,

I need some help and guidance of the following problem.

I have created the following kernel code to initialize an array of 76800 elements to value of INT_MAX.
const char init_arrays_cl[] = "
__kernel void init_arrays
(
__global int* input_1
,int value
,uint length
)
{
const uint index = get_global_id(0);

if (index < length) {
input_1[index] = value;
}
}
";

However, the output array is only set 19200 elements to the desired value while the rest seems uninitialized. No error message gets printed out.

Here are parts of my main program:

  1. worksize = 76800;
    mem1 = clCreateBuffer(context, CL_MEM_READ_WRITE, worksize, NULL, &error);
    if (error != CL_SUCCESS){
    printf("clCreateBuffer fails for mem1
    ");
    }

  2. cl_kernel k_cfg=clCreateKernel(prog, “init_arrays”, &error);
    if (error != CL_SUCCESS){
    printf("clCreateKernel fails %d
    ",error);
    }

  3. error = clSetKernelArg(k_cfg, 0, sizeof(cl_mem), &mem1);
    if (error != CL_SUCCESS){
    printf("clSetKernelArg fails for k_cfg mem1: %d
    ",error);
    }

  4. g_worksize = 76800;
    

    error=clEnqueueNDRangeKernel(cq, k_cfg, 1, NULL, &g_worksize, NULL, 0, NULL, NULL);
    if (error != CL_SUCCESS){
    printf("clEnqueueNDRangeKernel fails
    "); }

    error=clEnqueueReadBuffer(cq, mem1, CL_TRUE, 0, worksize, img_disp_left, 0, NULL, NULL);
    if (error != CL_SUCCESS){
    printf("clEnqueueReadBuffer fails for mem1 %d
    ", error);
    }

By the way, 19200 is equal to 76800 divided by 4.
I suspect something to do with int and char settings…
Any help or advice is appreciated.

error=clEnqueueReadBuffer(cq, mem1, CL_TRUE, 0, worksize, img_disp_left, 0, NULL, NULL);

You are only reading “worksize” bytes. What you want to do is this:

error=clEnqueueReadBuffer(cq, mem1, CL_TRUE, 0, worksize * sizeof(cl_int), img_disp_left, 0, NULL, NULL);

Hi David,

Thanks for coming to the rescue again.

So it means that I must think in terms of 1 byte (type char or unsigned char) transfer between the GPU and host, and adjust accordingly to the data type’s size I need to use. Is this correct?

For clarity sake, in addition to your suggestion, I think I need to create a buffer with same size, i.e.
mem1 = clCreateBuffer(context, CL_MEM_READ_WRITE, worksize * (cl_int), NULL, &error);

Finally, this initialization of arrays, is it better to be done in GPU or CPU? What’s your thoughts on this?

So it means that I must think in terms of 1 byte (type char or unsigned char) transfer between the GPU and host, and adjust accordingly to the data type’s size I need to use. Is this correct?

Yes, that’s right. All the APIs in OpenCL are based on bytes as far as I remember.

For clarity sake, in addition to your suggestion, I think I need to create a buffer with same size, i.e.
mem1 = clCreateBuffer(context, CL_MEM_READ_WRITE, worksize * (cl_int), NULL, &error);

Yes, absolutely! I missed that.

Finally, this initialization of arrays, is it better to be done in GPU or CPU? What’s your thoughts on this?

I would use whatever device is going to use that data later. What you want to avoid is forcing OpenCL to copy data between devices.