Page 1 of 2 12 LastLast
Results 1 to 10 of 16

Thread: If statements in shaders, confused by results.

  1. #1
    Junior Member Newbie
    Join Date
    Jun 2015
    Posts
    8

    If statements in shaders, confused by results.

    Hi!
    I have some questions regarding performance that im quite confused about, wherever i read i get the impression that "if-statements" is a big no-no unless you really need them.
    As I understand it it is because the GPU simply evaluates both branches of an "if-else" and just discard the result of the one on the loosing side of the condition.
    For example if I do something like:

    Code :
    float someValue = value1 * value2;
    if(someValue > someOthervalue)
    ... do some calculations "c1"...
    else
    ... do some other calculations "c2"...

    ...both calculation "c1" and "c2" gets evaluated, thus performance take a hit. (At least thats what I thought)

    What I found was that if the calculation "c1" was a very expensive one and "c2" was a very cheap one, having the condition there to stop some fragments entering "c1" calculation actually had a positive effect on performance.
    How is that possible? Have I completely misunderstood something?
    My scene took about 9.2 ms per frame to render before i started optimizing and after removing any branches from the code I ended up with 10ms per frame which is really not what I wanted

    The code running (excluding implementation of functions) was the following. Removing these if-statements "should" not have a negative impact on the performance right?

    Code :
    void main()
    {
    	vec3 color = vec3(0,0,0);
     
    	E = normalize(E_in);
    	L = normalize(L_in);
    	H = normalize(H_in);	
     
    	float spotFactor = dot(L, -lightDirection_fs);
     
    	// Inside spot cone
    	if(spotFactor > lightCutoff)
    	{
    		vec2 UVs = GetTextureCoords(E);
     
    		vec3 N = FindNormal(E, UVs);
    		float visibility = GetShadowValue();
    		float lightAmount = GetAttenuationValue();
     
    		vec3 diffuse = vec3(0,0,0);
    		vec3 specular = vec3(0,0,0);
     
    		if(lightAmount >= 0.01)
    		{
    			vec3 diffuse = CalculateDiffuseLight(N, UVs);
    			if(visibility == 1.0)
    				specular = CalculateSpecularLightBlinnPhong(N, UVs);
    			color = (diffuse + specular) * visibility * lightAmount;
    			color *= (1.0 - (1.0 - spotFactor)/(1.0 - lightCutoff));
    		}
    	}
    	gl_FragColor = vec4(color, alpha);
    }

  2. #2
    Senior Member OpenGL Guru
    Join Date
    Jun 2013
    Posts
    3,112
    Quote Originally Posted by Patrikwa View Post
    As I understand it it is because the GPU simply evaluates both branches of an "if-else" and just discard the result of the one on the loosing side of the condition.
    Not necessarily.

    Modern hardware can perform an actual branch if the condition has the same value for all elements in a SIMD block (what nvidia calls a "warp" and AMD a "wavefront").

  3. #3
    Junior Member Newbie
    Join Date
    Jun 2015
    Posts
    8
    Quote Originally Posted by GClements View Post
    Not necessarily.

    Modern hardware can perform an actual branch if the condition has the same value for all elements in a SIMD block (what nvidia calls a "warp" and AMD a "wavefront").
    Thanks for the information!
    Could you elaborate this statement?
    If I for example use a uniform and compare it to a static number (I used a lot of "if(useTexture == 1) and similar that I tried removing" would that count as something modern hardware can actually branch?
    Any specific scenarios that cannot be warped?
    What is "modern" hardware in this sense? I need to support "older" hardware, back to at least around 2007.
    Im sitting at a GTX970 so I suppose it can handle this "Warp" technology and thats why I get these unexpected results? (well, unexpected for me at least)

  4. #4
    Member Contributor
    Join Date
    Mar 2014
    Posts
    59
    Why dont remove the last if?
    If your ligtamount is little you dont need to compute illumination...

    Or remove all the if... if your lightamount is so little, result will be nearly vec3(0)... depending of the cost of FindNormal(E, UVs), GetShadowValue();GetAttenuationValue();

    I use fragment shaders with lot of "if" in a "while" loop on GTX 980, without any slow down... maybe there is another problem...

  5. #5
    Senior Member OpenGL Lord
    Join Date
    Mar 2015
    Posts
    6,678
    Quote Originally Posted by Patrikwa View Post
    Thanks for the information!
    Could you elaborate this statement?
    If I for example use a uniform and compare it to a static number (I used a lot of "if(useTexture == 1) and similar that I tried removing" would that count as something modern hardware can actually branch?
    Yes, it would count as a branch. That doesn't mean it's bad; all of the different instances will take the same branch, so the cost is minimal.

    Remember: the only problem with branching in a shader is if different instances executing on the same computational unit have to take different paths.

    Quote Originally Posted by Patrikwa View Post
    Any specific scenarios that cannot be warped?
    Just look at it from the perspective of the hardware. The individual instances will need to be broken up if, and only if, the conditional expression can be taken by multiple different instances in the same rendering command and if the a particular computation actually results in neighboring instances taking different paths.

    For example, all fragment shader instances get the same gl_PrimitiveID value (as well as `flat in` interpolated values). Conditions based on those will not be statically uniform, but they will be uniform within each "warp/wavefront". Therefore, neighboring instances will always take the same path, so conditions based on them will be reasonably fast.

    Even if a condition is based on interpolated input parameters, that alone doesn't mean that your rendering will be slow. You only pay the price performance-wise for those specific instances where the runtime condition forces "warp/wavefronts" to actually be broken up. So if you were rendering a full-screen quad, and you're doing a condition based on being on the left half of the quad rather than the right, then only those "warp/wavefronts" in the middle will be slower.

    Don't be afraid of conditions. Be aware of them, and use them judiciously, but don't assume that conditions are always (or even usually) terrible.

    Quote Originally Posted by Patrikwa View Post
    What is "modern" hardware in this sense? I need to support "older" hardware, back to at least around 2007.
    I would say that, for the purposes of this discussion, modern would be anything DX10 or better. That's about 2008 or so. That was the point when unified shader architectures became the norm. Even so, older hardware had similar properties, at least with respect to static/uniform branching.

    Quote Originally Posted by Patrikwa View Post
    Im sitting at a GTX970 so I suppose it can handle this "Warp" technology
    It's not "technology"; it's "terminology". That's just what NVIDIA calls the individual instances that are operating on the same computational unit.

  6. #6
    Junior Member Newbie
    Join Date
    Jun 2015
    Posts
    8
    Quote Originally Posted by __bob__ View Post
    Why dont remove the last if?
    If your ligtamount is little you dont need to compute illumination...

    Or remove all the if... if your lightamount is so little, result will be nearly vec3(0)... depending of the cost of FindNormal(E, UVs), GetShadowValue();GetAttenuationValue();

    I use fragment shaders with lot of "if" in a "while" loop on GTX 980, without any slow down... maybe there is another problem...
    Yeah, as I said, Im trying to optimize the code by removing if-statements, but when I do, i get worse performance, which is the exact opposite to what I thought I knew about the graphics pipeline
    Im not sure what you mean otherwise. If I remove the last "if" then I would do all the lightning calculations despite the fragment being in completely darkness which seems quite unnecessary.

  7. #7
    Junior Member Newbie
    Join Date
    Jun 2015
    Posts
    8
    Thanks a lot for the elaboration on the subject Alfonse Reinheart, really helps!
    I thought for a minute there that I did something wrong with my testing since most of the conditions did not affect performance in any direction.

    In the light of this new (for me) information, in the case of multiple shaders vs single shader with conditions, what would actually be preferred performance wise? Lets say a basic scenario where I need to either calculate TBN matrix in vertex shader or not, depending on if displacement mapping is active or not. I could of course test the different scenarios myself and compare the actual render time but it would be interesting to hear someones theories on the matter.
    So basically, binding 2 different shaders or using 1 shader and let a uniform control if some code is executed or not by using conditions. Ive seen this been debated before but without any final result.

  8. #8
    Senior Member OpenGL Guru
    Join Date
    Jun 2013
    Posts
    3,112
    Quote Originally Posted by Patrikwa View Post
    If I for example use a uniform and compare it to a static number (I used a lot of "if(useTexture == 1) and similar that I tried removing" would that count as something modern hardware can actually branch?
    That's something that even old hardware can handle: the driver can compile different versions of the shader for useTexture==1 and useTexture!=1, and select the appropriate version for each draw call.

    Expressions fall into three basic types:
    1. Statically-uniform, where the expression's value is constant for an entire draw call.
    2. Dynamically-uniform, where the expression's value is constant for all "threads" within a work group (warp, wavefront). In the fragment shader, this includes variables with "flat" interpolation, gl_PrimitiveID, gl_Layer, etc.
    3. Non-uniform, where the value is different for different vertices or fragments within the same work group.

    All hardware can optimise branches (i.e. only evaluate the branch actually taken) for case 1. More modern hardware can also do so for case 2. Case 3 requires that both branches are executed, with results being discarded within the branch not taken.

    To get better results in all cases, ensure that any common calculations are lifted out of the conditional, so you don't end up performing essentially the same computation twice in the case both branches are executed. If you have simple and complex cases, lifting common subexpressions may result in the branch for the simple case being empty, in which case the issue of "executing both branches" doesn't arise.

    Aside from performance, certain operations are undefined within non-uniform control flow, specifically derivatives (the dFdx(), dFdy() and fwidth() functions), as well as sampling mipmapped textures (as that implicitly uses derivatives). So if you're accessing textures with those functions, the call needs to be outside of any conditional statement even if only one branch actually needs the texture data.

  9. #9
    Junior Member Newbie
    Join Date
    Jun 2015
    Posts
    8
    Great stuff GClements!
    This is one helpful place for sure

    Quote Originally Posted by GClements View Post
    ...where the expression's value is constant for all "threads" within a work group (warp, wavefront)
    What defines a work group here? Is each triangle a work group? Or each array of vertices in a VAO?

    Quote Originally Posted by GClements View Post
    ...as well as sampling mipmapped textures (as that implicitly uses derivatives)
    Im generating mipmaps for my textures and acccess them through standard texture2D in shader, I assume this means it's using derivatives to sample then?

    Quote Originally Posted by GClements View Post
    2. Dynamically-uniform, where the expression's value is constant for all "threads" within a work group (warp, wavefront). In the fragment shader, this includes variables with "flat" interpolation, gl_PrimitiveID, gl_Layer, etc.
    3. Non-uniform, where the value is different for different vertices or fragments within the same work group.
    In case of a fragment shader, would this condition fall into category 1, 2 or 3? The value "distanceToPoint" here obviously changes between each vertex, however the "lightRange" is a static uniform that changes between draws (since im running forward rendering this is executed for every light per fragment)

    Code :
    	float distanceToPoint = length(lightPosition - vertexWorldSpace);
    	if(distanceToPoint <= lightRange)
    	{
    		float sqDist = pow(lightPosition.x - vertexWorldSpace.x, 2.0) + pow(lightPosition.y - vertexWorldSpace.y, 2.0) + pow(lightPosition.z - vertexWorldSpace.z, 2.0);
    		float cAttenuation = lightAttenuation.x + (lightAttenuation.y * 0.001 * sqrt(sqDist)) + (lightAttenuation.z  * 0.00001 * sqDist);
    		return min(5.0, 1.0 / cAttenuation);
    	}
    	else
    		return 0.0;

  10. #10
    Senior Member OpenGL Guru
    Join Date
    Jun 2013
    Posts
    3,112
    Quote Originally Posted by Patrikwa View Post
    What defines a work group here? Is each triangle a work group? Or each array of vertices in a VAO?
    In the fragment shader, it's typically a rectangular "block" of fragments, with the size determined by the implementation (32 or 64 is typical). All of the fragments will belong to the same primitive (i.e. gl_PrimitiveID will be the same for all fragments in the work group, as will any "flat"-qualified inputs).

    In the vertex shader, it's some number of vertices, again with the number determined by the implementation. All vertices will correspond to the same draw call.

    Quote Originally Posted by Patrikwa View Post
    Im generating mipmaps for my textures and acccess them through standard texture2D in shader, I assume this means it's using derivatives to sample then?
    Yes. The texture functions with "Lod" or "Grad" in the name take an explicit level-of-detail or explicit derivatives from which the level-of-detail is calculated. The other functions are equivalent to calling the corresponding "Grad" function with the derivatives obtained using dFdx() and dFdy(), so these are undefined within non-uniform control flow (dFdx() and dFdy() calculate the difference between the value for the current fragment and the value for a horizontally- or vertically-adjacent fragment; within non-uniform control flow, the value for adjacent fragments may be garbage if those fragments take a different branch).

    Quote Originally Posted by Patrikwa View Post
    In case of a fragment shader, would this condition fall into category 1, 2 or 3? The value "distanceToPoint" here obviously changes between each vertex, however the "lightRange" is a static uniform that changes between draws (since im running forward rendering this is executed for every light per fragment)

    Code :
    	float distanceToPoint = length(lightPosition - vertexWorldSpace);
    	if(distanceToPoint <= lightRange)
    	{
    This falls between cases 2 and 3. Formally, it's non-uniform control flow (case 3), as different fragments within the same work group can have different values for the comparison. Consequently, the result of texture() would be undefined within the branches.

    However, the values will typically be highly correlated, i.e. adjacent fragments will often have similar values for distanceToPoint. In many cases, all fragments within a work group will have the same value for the result of the comparison, and thus a modern GPU will only execute one branch. It's only in the case where the block of fragments forming the work group lies on the boundary that the GPU will need to execute both branches (which doesn't really matter anyhow, as the "else" branch is trivial).

Page 1 of 2 12 LastLast

Similar Threads

  1. Very Very Strange results with shaders
    By abhishek bansal in forum OpenGL: Advanced Coding
    Replies: 8
    Last Post: 06-17-2011, 10:02 AM
  2. to many glvertex3f statements
    By ninjarider in forum OpenGL: Basic Coding
    Replies: 2
    Last Post: 08-04-2005, 12:50 AM
  3. for/while statements
    By Aeluned in forum OpenGL: GLSL
    Replies: 11
    Last Post: 05-19-2004, 12:50 PM
  4. if/then/else and for statements
    By divide in forum OpenGL: GLSL
    Replies: 10
    Last Post: 03-26-2004, 06:16 AM
  5. Interpretation of statements
    By dummy in forum OpenGL: Basic Coding
    Replies: 2
    Last Post: 08-28-2000, 12:42 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Proudly hosted by Digital Ocean