compilator settings- the key to the biggest performance

Hey Everyone,

i’m very new in here, either to openCL.

i’m trying to understand, how “does it works” and i’m going through very illogical problems (at the first look)

some - i solved, and i saw, that most common “issue” is the compilator’s “attitude”, i mean: optimization, which he makes.

for example:
parts of code :

case 1 : int ITER ; declared as an argument to the kernel function, which is set to 1.0e4
case 2 : ITER; declared in macro sent to the program (the same value 1.0e4)

for (int i = 0; i < ITER; ++i) {
// some computations, doesn’t matter
}

case 1 : 100FPS
case 2 : 700FPS
Ofc: it doesn’t matter how many fps, but how big the different is
all because of the optimization made by compiler
in case 1 he prepare code as there’ll be not many loops
in case 2 - he knows the number of loops

second ex:
for (int i = 0; i < ITER; ++i) {
case 1: if (i == ITER) // notice- it’ll never happen!
barrier(CLK_LOCAL_MEM_FENCE)
case 2: if (i == ITER)
// some things… it doesn’t matter
}

the result ?
case 1: 200 FPS
case 2: 700 FPS
all because of instuctions which will never be used…

and the question is… are there any way to “set” compiler exactly how we want to ?
especially the second example… why does he respond in this way for this instruction?

ps sorry for my english, but i hope, it’s able to be read

The function clBuildProgram takes a char array for one of its arguments. This array passes parameters to the compiler, including options that control optimisation. Try passing “-cl-opt-disable” to disable all optimisations.

I don’t completely follow your second example. Are the two cases the following:
Case 1:


for (int i = 0; i < ITER; ++i) {
   if (i == ITER) // notice- it'll never happen!
      barrier(CLK_LOCAL_MEM_FENCE)
   /* Rest of code */
}

Case 2:


for (int i = 0; i < ITER; ++i) {
   /* Rest of code */
}

I would expect case 1 to be slightly slower if the if-statement isn’t optimised out, or to take the same amount of time as case 2 if the if-statement is removed by the compiler. Try using the profiling tool appropriate to your development environment to see what is happening.

In case 1 the compiler has a constant, it can unroll the loop with known exact bounds, and it can completely remove any conditional code it knows cannot be reached.

In case 2, the loop bounds are unknown at compile time, so it cannot unroll the loop as precisely, and it cannot assume anything about the range (well maybe it can, but C allows the loop index to be set to anything).

But your barrier stuff is invalid usage anyway, so any results you get are pretty meaningless. Every workitem needs to execute the same barrier every loop, so they cannot be conditional to start with.

This stuff isn’t compiler settings, this is just writing your code a) valid, and b) using constants where possible to help it optimise any calculations associated with the loop index.