pointer type conversion in kernel won't work

I’m at a loss to why the following doesn’t work in OpenCL, as it does work in ordinary C and in CUDA. I want to take a character pointer to global memory, make it a pointer to an unsigned short, take its content, convert it back to a char and then write it back to the content. The following code will give an CL_INVALID_COMMAND_QUEUE error, which I’ve learned basically means your kernel doesn’t work, whereas after uncommenting the commented line it works fine. The latter makes it unclear to me, since I don’t see the difference. (I’ve added the last line only to demonstrate that there’s not some kind of problem of self-reference.)

__kernel void test(__global char *test)
{
unsigned long idx = (get_global_id(1)*get_global_size(0) + get_global_id(0));
unsigned short *x;
*x = (unsigned short)(test+idx);
//*x = 300;
*(test+idx)=(char) *x;
*(test+idx)=(char) (unsigned short) *(test+idx);
}

The following code will give an CL_INVALID_COMMAND_QUEUE error, which I’ve learned basically means your kernel doesn’t work, whereas after uncommenting the commented line it works fine.

Variable “x” is a pointer that is never initialized and you are writing into it. That is causing an invalid memory access. A CL_INVALID_COMMAND_QUEUE often means “page fault”/“segmentation fault”/“invalid memory access”. Have you tried passing a pfn_notify function to clCreateContext()? Try doing that and you will get a more detailed error message than simply “CL_INVALID_COMMAND_QUEUE”.

Now let’s go back to your initial issue pointers to chars and pointers to shorts. Let’s see if I understand: you want to do something like this?


__kernel void foo(__global char* bar)
{
    size_t idx = (get_global_id(1)*get_global_size(0) + get_global_id(0));
    __global unsigned short* x = &bar[idx]; 

    bar[idx] = (char)x[0];
}

Notice two things: “x” must be defined as a pointer to __global memory as shown above. Second, even if you do this, there’s still an issue with memory alignment: OpenCL C requires that all data types must be naturally aligned. This means that, for example, shorts must always be located at a 2-byte memory boundary. If your kernel casts between pointers of different types, it has the responsibility of guaranteeing that the alignment requirements are not violated.

This is not the case in this example: “bar” is an array of chars, which are 1-byte variables, which means that if &bar[0] is an even memory address then
&bar[1] must be an odd memory address and vice versa. Therefore, if &bar[0] is correctly aligned for a short, then necessarily &bar[1] is not.

Programming in OpenCL C is hard unless you are already very familiar with plain C99 to begin with.

Thanks for the swift response.
First, you were of course right about the pointer, although it’s not the cause of the problem here.
Second, I agree that the error must be due to ‘invalid memory access’.
However, I don’t think it’s the alignment that causes a probem, but the fact that casting a pointer to a different type discards the address space qualifier. The following code did work:

__kernel void test(__global char *test)
{
unsigned long idx = (get_global_id(1)*get_global_size(0) + get_global_id(0));
__global unsigned short x = (unsigned short __global)(test+idx);
*(test+idx)=(char) x;
}

I initially had left out the __global qualifier on the left, because it gave an error, but that simply lead me further away from the solution.
It’s fixed now, but I still don’t understand why the original code does work with the uncommented line and not without it.

Be aware that what I said earlier about alignment is still true. The code you’ve posted is not portable. It may work in some architectures and it will fail in others. I do not recommend doing that.