newbie question: what is special about conditions

Hi!

I am trying to learn OpenCL.

What is special about conditions when writing opencl code?

The condition ((a+b)==(b+a)) should always be true in my humble opinion if a and b are integers. But that condition is not always true in my kernels.

I have asked this on stack overflow:
http://stackoverflow.com/questions/1321 … put-buffer

I found this tutorial on DrDobbs:
http://www.drdobbs.com/parallel/a-gentl … 854?pgno=3
The tutorial contains a example program. When I compile and run that example program it works for me. So I guess my env is working. But then when I put the condition that should always be true in the kernel it doesn’t behave as it should.

I changed this code in the DrDobbs example kernel:


   uint global_addr, local_addr;

   global_addr = get_global_id(0) * 2;
   input1 = data[global_addr];
   input2 = data[global_addr+1];
   sum_vector = input1 + input2;

To:


   uint global_addr, local_addr;

   global_addr = get_global_id(0) * 2;
   local_addr = global_addr + 1;
   input1 = data[global_addr];
   input2 = data[global_addr+1];
   if ((global_addr+local_addr) == (local_addr+global_addr))
       sum_vector = input1 + input2;
   else
       sum_vector = input1 - input2;

When that kernel is executed the result is not the same.

do you write the sum to the data array? or does it go in another array?

Keep in mind tha you dont know when a thread is called. So the execution of id 50 might be next to id 60 and id 1 might be one of the last.

Thank you for the response.

do you write the sum to the data array? or does it go in another array?

I am not sure now… but I can try to look it up. But in my humble opinion I just think that “x=do_something();” and “if (1) x=do_something();” should do the same thing. No matter what you do with x later. If not feel free to explain this.

Keep in mind tha you dont know when a thread is called.

This is an interesting aspect that I haven’t thought about. But to me it doesn’t explain why my code doesn’t work. The condition should always be true no matter what order is used.

What is “sum_vector”?

If it is something where each work item is writing to the same variable, that’s your problem. You can’t do that. You could use an atomic operator to do it, but it would be slow. Look up parallel reduction for tips on how to do it quickly.

In the future, post more complete examples.

The example code is taken from a DrDobbs tutorial. I doubt they made serious mistakes, such as writing to the same variable from many work items.

If you want a short and complete code example, we can discuss this code:

#pragma OPENCL EXTENSION cl_khr_byte_addressable_store : enable
__kernel void hello(__global char * out)
{
    size_t tid = get_global_id(0);
    int a = tid & 0xff;
    int b = (tid >> 8) & 0xff;
    out[ 1 & ((a+b)==(b+a)) ] = (char)1;
}

In my opinion the condition ((a+b)==(b+a)) should always be true. I read somewhere that the comparison operator will return 0xFFFFFFFF when the condition is true. So every work item should write a 1 to out[1].

The problem with that kernel is that a 1 is also written to out[0].

Isn’t “a+b” equal with “b+a” for some reason sometimes?

In my opinion the condition ((a+b)==(b+a)) should always be true. I read somewhere that the comparison operator will return 0xFFFFFFFF when the condition is true. So every work item should write a 1 to out[1].

To clarify that. I believe ((a+b)==(b+a)) is always true and result in 0xFFFFFFFF. And therefore out[1 & ((a+b)==(b+a))] should be the same as out[1&0xFFFFFFFF] that should be the same as out[1].

If I change the code in my kernel to out[1&((a+b)==(a+b))] then out[0] is never written. My guess is that the compiler see that the comparison is always true (because of identical expressions on both sides of == operator) and simplify it to out[1].

How do you know that out[0] is being written? Is something clearing it first? If the conditional is always true, then out[0] should not be written at all. If you suspect it is, then change the code to store the value of a in out[2] and the value of b in out[3] and figure it out from there. AMD supports printf in their kernels, and AMD, NVIDIA, and Intel have single-stepping debuggers you could use to figure this out.

yes the memory is cleared first.

To figure out “a” and “b” I used this code in a kernel:


char result = 0;
for (int a = 0; a < 2; a++) {
    for (int b = 0; b < 2; b++) {
        if ((a+b) != (b+a))
            result |= (1 << (a+2*b));
    }
}

I would expect that result got the value 0.

But result get the value 6. So for {a=0, b=1} and {a=1, b=0} the condition has unexpected behaviour.

I would really like if someone would tell me if he can/can’t reproduce my problems.