goto statements in OpenCL

Are goto statements supported in OpenCL?

I saw few posts regarding this but they seem to be pretty old. I am hoping to see if there are any updates. To provide some context, I am thinking of using a finite state machine C code on the GPU side that is runtime generated (on the CPU side) based on a given regex and this code has goto statements.

Please let me know.

I cannot find anything in standart about goto being restricted in relation to normal C. It might perform very poorly on GPUs, though, but I assume you’re aware of that.

Thank you for the reply. Yes, I am aware that goto statements perform poorly in GPUs. But I am trying to understand the reason behind the poor performance due to goto statements.

Can you please tell me what makes goto statements perform poorly in GPUs?

Goto by itself is fine and efficient, but “finite state machine” part is what concerns me. Each OpenCL workitem shares the instruction pointer with 32/64 others. This means even in case of simple if-else statements in which one half of the threads takes one route and the other half takes another, the execution time effectively doubles. Somewhat like this:

bool x = condition()
turnOffThreadsNotFittingCriteria(x)
//code

//code
turnOffThreadsNotFittingCriteria(!x)
//code for “else”

//

This will be much worse in case of state machine, unless you can guarantee that most of the time each thread in a 64 thread cluster will be in the same state. Then a GPU will easily jump over unused branches of execution.

Then again, it may turn out to be an uncharted territory: rarely used and therefore poorly tested. Perfectly valid code may refuse to work due bugs in kernel compiler (coughing sound AMD coughing sound).

Thank you for the explanation!

I am getting a gist of what’s happening in this case. Would you agree that it is a similar case when using switch-case statements?

Also, can you please share any reading material (if any) related to this case?

Thanks!

Would you agree that it is a similar case when using switch-case statements?

Switch-case is essencially goto, so yeah.

You should find plenty of info on the topic and more by googling “CUDA/AMD GPU/Intel GPU optimization guide”. Any of 3, I mean.