vloadn and vstoren

Hi everybody, this is my first post on khronos forum so I would to say hello to community :slight_smile:

My problem concerns the access to an array using the vstore and vload functions. My kernel function is declared like:

__kernel void wordCount(__global const char *a,__global char *o)
{
int index = get_global_id(0);

char8 c = vload8(0,a);

vstore8(c,0,o);

}

so the input is in *a and the output should be *o. This func should load a part of *a array to c (according to the thread id) and save it into the output variable so as to have same input and output.

Ps I know it is stupid example but i have to understand how to access to an array.

Thankyou

Your use of vload8() and vstore8() is legal but you probably mean:

char8 c = vload8(index, a);
vstore8(c, index, o);

yes I have already done this but it does not work. I know how the memory works.
Unfortunately when I use an input greater than 2 letters I receive in output some strange symbols.

Can nobody else help me?

You should give more information, such as the way you allocate and fill your buffers, and the work sizes of clEnqueueNDRangeKernel().

OK i’m sorry. Before I thought my problem was inside kernel, now I understood it could be the different size of a char array between c and java.
In java one char is big 2 byte while in c it is just 1.



char chunk[] = new char[]{'a','b','c','d','e'};

Pointer srcA = Pointer.to(chunk);

memObjects[0] = clCreateBuffer(context, CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR,Sizeof.cl_char * (chunk.length*2), srcA, null);

clEnqueueWriteBuffer(commandQueue, memObjects[0], CL_TRUE, 0, chunk.length * Sizeof.cl_char, srcA, 0, null, null);

clSetKernelArg(kernel, 0, Sizeof.cl_mem, Pointer.to(memObjects[0]));

clEnqueueNDRangeKernel(commandQueue, kernel, 1, null, global_work_size, local_work_size, 0, null, null);


the kernel is shown above.
I run the kernel with just 5 threads. when I get out my output array (declared as the input) I don’t obtain the same value, because surely I don’t write correctly the array inside kernel.

Now I would like to know if there is a way to resolve this problem that giving me worries!!

Thankyou.

Sorry if other info are missing.

To fully understand your problem, you should explain:

  • how you create your output buffer
  • how you read data from your output buffer after the execution of the kernel
  • how you define global_work_size and local_work_size

WordCount.java


char i[] = new char[]{'a','b','c','d','e'};
char o[] = new char[i.length];

Pointer input = Pointer.to(i);
Pointer output = Pointer.to(o);
        
cl_mem memObjects[] = new cl_mem[2];
memObjects[0] = clCreateBuffer(context, CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR, Sizeof.cl_char * (i.length), input, null);
memObjects[1] = clCreateBuffer(context, CL_MEM_READ_WRITE, Sizeof.cl_char * o.length, output, null);

<GETTING ALL PLATFORM AND DEVICE INFO> 

String pr = utils.readKernelFile("kernel.cl");
        
cl_program program = clCreateProgramWithSource(context, 1, new String[]{ pr }, null, null);
clBuildProgram(program, 0, null, null, null, null);        
cl_kernel kernel = clCreateKernel(program, "wordCount", null);
        
clEnqueueWriteBuffer(commandQueue, memObjects[0], CL_TRUE, 0, i.length * Sizeof.cl_char, input, 0, null, null);
clEnqueueWriteBuffer(commandQueue, memObjects[1], CL_TRUE, 0, o.length * Sizeof.cl_char, output, 0, null, null);
        
clSetKernelArg(kernel, 0, Sizeof.cl_mem, Pointer.to(memObjects[0]));
clSetKernelArg(kernel, 1, Sizeof.cl_mem, Pointer.to(memObjects[1]));
        
long global_work_size[] = new long[]{5};
long local_work_size[] = new long[]{1};
        
clEnqueueNDRangeKernel(commandQueue, kernel, 1, null, global_work_size, local_work_size, 0, null, null);

clEnqueueReadBuffer(commandQueue, memObjects[1], CL_TRUE, 0, o.length * Sizeof.cl_char, output, 0, null, null);
        
<RELEASING RESOURCES>
        
System.out.println("
Output text   " + Arrays.toString(o)); // It is different from input

kernel.cl file


__kernel void wordCount(__global const char *i,__global char *o)
{
	const int index = get_global_id(0);
	
	barrier(CLK_GLOBAL_MEM_FENCE);
	
        o[index] = i[index];
}

This is all my code. I think the error is the difference of char’s length.

I don’t know if it is useful but I am using a GeForce GT 520MX and driver version 319.37.

In Java, a char is 2-byte long, so you should replace cl_char by cl_ushort in your Java code, and char by ushort in your kernel.

Furthermore, the barrier is unnecessary in your kernel.

Otherwise, everything else seems fine.

DONE!! this is one solution, I treated char array as bytes and have left kernel arguments as char and it works fine!!

Thank you :slight_smile: