There are some errors in the code:
const char *kernel_source =
"__kernel void simple(
"
" global read_only int* input,
"
" global write_only int* output)
"
"{
"
" int index = get_global_id(0);
"
" output[index] = index;
"
"}
";
read_only and write_only are qualifiers for images, not for pointers. This kernel should not compile successfully.
input = clCreateBuffer(context, CL_MEM_READ_ONLY | CL_MEM_USE_HOST_PTR, sizeof(data)*dataSize, &data, &err);
“sizeof(data)” means “sizeof(int*)”. That’s not what you want. What you want is “sizeof(cl_int)” since the type of the argument “input” in kernel “simple()” is “int”. Notice that in the host side you should always use data types with the prefix “cl_”, such as “cl_int” to ensure that they have the same bit representation as the kernel data types.
Also, you are passing “&data” as the host pointer, when you actually want to pass “data”. This is probably the reason your program is crashing.
Finally, notice that since you have create this buffer with the flag CL_MEM_USE_HOST_PTR, the implementation will attempt to reuse the memory allocated in “data” instead of allocating new memory. What that means is that since the kernel will take the buffer “input” as an array of CL int values, the variable “data” should be declared as having type “cl_int*” instead. That ensures that the size of the buffer will match on the host and the device.
output = clCreateBuffer(context, CL_MEM_WRITE_ONLY, sizeof(data)*dataSize, NULL, &err);
Same as before: it should be “sizeof(cl_int)”.
err = clEnqueueWriteBuffer(queue, input, CL_TRUE, 0, sizeof(data)*dataSize, data, 0, NULL, NULL);
Please replace “sizeof(data)” with “sizeof(cl_int)”.
More importantly, this code is doing something very strange: it’s copying the contents of “data” into the buffer “input”. Why is this strange? Because “data” and “input” reference exactly the same memory. That’s what the CL_MEM_USE_HOST_PTR flag means.
err = clEnqueueCopyBuffer(queue, output, input, 0, 0, sizeof(data)*dataSize, 0, NULL, NULL);
I don’t understand the purpose of this. The kernel that was just executed wrote some values into “output” and now you are overwriting them with the values that were stored in “input” (i.e. zeroes). Is this what you intended?
Replace sizeof(data) with sizeof(cl_int).
err = clEnqueueReadBuffer(queue, output, CL_TRUE, 0, sizeof(data)*dataSize, dataOut, 0, NULL, NULL );
Same drill: replace sizeof.