According to the book "OpenCL in action":

"Comparisons are time-consumingon the best of processors, but they’re especially slow on dedicated number-crunchers like graphic processor units (GPUs). GPUs excel at performing the same operations over and over again, but they’re not good at making decisions. If a GPU has to check a condition and branch, it may take hundreds of cycles before it can get back to crunching numbers at full speed."

But in this great book there are few samples where the kernel contains 'for' loops:
matrix transposition: page 261
matrix multiplication: page 264
DFT: page 314

My question is: Is it possible to avoid 'for' and 'while' loops in kernel functions ?

And another one: Let's say I have only 5 work groups. It means that I need 5 cores.
Am I right ?