Shader compiler fragility?

After releasing transparency capabilities in GlowScript (glowscript.org) I discovered that neither transparency nor mouse picking was working on Ubuntu and the MacBook Pro running Snow Leopard (works okay on Lion). I had to work through various strange shader problems. Consider the pick machinery, as simpler than but related to the transparency case (they share the same vertex shader). In picking, each object is given a false color that is the id of that object. No lighting is invoked, so that the color of all pixels of an object represent the id of that object, and readPixels yields the id of the object that currently lies under the mouse.

Start with the vertex and fragment shaders used to make a simple opaque scene, which works on all platforms. That is, replace the pick shaders with those used for normal scenes.

In pick mode the CPU sends false color data to the GPUs. The vertex shader need not change, but the fragment shader needs to be changed to eliminate lighting calculations, since all pixels of an object need to have the same false color, representing the id of that object. A minimal change is to leave the call to the lighting calculation in place, and just change the assignment to gl_FragColor to use the color coming from the vertex shader, leaving in place the lighting calculation. (Note that the color is a varying whose value in fact doesn’t change for an object because the color is the same for all vertices in a GlowScript object.) This fails on Ubuntu, even though it’s just a change in the gl_FragColor assignment statement. I had to give up on Ubuntu, at least for now.

Because I want the pick calculation to be as fast as possible, I tried to eliminate as much unnecessary code as possible from both the vertex and fragment shaders. I fairly quickly reduced the fragment shader to the following, which works on Snow Leopard:

#ifdef GL_ES
precision highp float;
#endif

varying vec4 mat_color;

void main(void) {
gl_FragColor = mat_color;
}

I then started trying to eliminate unnecessary code from the vertex shader used for picking. The process was very frustrating. Just about any scheme continued to work on Windows, but Snow Leopard was very unforgiving. It took many hours of trial and error testing to reduce the code in the vertex shader and still have picking work on the Mac. Because depth peeling uses the same vertex shader, once I got picking to work I was also able to get depth peeling to work.

Originally I thought that the transparency problem had to do with writing and reading lots of textures, but this turned out not to be correct. The problem rather seems to have something having to do with shader compilation.

You can see demos of mouse picking and transparency here:

http://www.glowscript.org/#/user/GlowSc … usePicking

http://www.glowscript.org/#/user/GlowSc … ansparency

More on this:

As you know, I recently implemented transparency through depth peeling in GlowScript, with what I consider to be adequately clean code. Then it developed that it didn’t work on MacBook Pros running Snow Leopard (nor did picking work, which uses related techniques) but it did run on Lion, nor did transparency or picking work on Linux (Ubuntu and CentOS). I’ve now managed to get transparency and picking working on all of these platforms, but I’m unhappy about what I had to do to make these features work, and I wonder whether you might have insight into my problem.

By tedious trial-and-error programming, a style I’ve never before had to use in such an extensive way, I found a set of shader statements that worked. The key was to copy into the depth peeling and pick shaders completely irrelevant code needed only for normal rendering, including in particular a lighting routine (irrelevant to constructing a depth map or picking with false colors).

I found that the Snow Leopard environment to a significant degree and the Ubuntu environment to a massive degree were extraordinarily sensitive to the slightest code changes, changes that had no effect on Windows, my primary development environment. Especially in the Ubuntu case, using the latest NVIDIA driver for Linux, I not only had to identify by experiment what statements I could use and needed to use but also I had to learn how to “fool” Linux into letting me sneak in my needed functionality using syntax that mimicked lighting.

Have you ever encountered such a situation, in either OpenGL or WebGL?

You can see GlowScript examples including MousePicking and Transparency here:

http://www.glowscript.org/#/user/GlowSc … r/Examples

Take a look at the code for these examples. You’ll see that the programs are quite small and rather low-tech, suitable for nonexpert programmers.

I would of course be interested to hear of failures, on any platform.

I made a mistake in posting the URL, which should be this:

http://www.glowscript.org/#/user/GlowSc … /Examples/

While it is all too fresh in my mind, I should report a very clear example of a problem with WebGL shaders that I’ve alluded to before.

In the process of developing an important new feature for GlowScript (glowscript.org), I added two additional attributes to the vertex shaders (there are several shaders for different modalities: display, pick, depth peeling, etc.). There had been 4 attributes, so now there are 6, well below the typical number of attributes supported by modern graphics cards.

My main machine is Windows, but having seen painfully in the past that one cannot expect cross-platform consistency in WebGL shaders, I tried out the new code on Snow Leopard on a MacBook Pro and Ubuntu 11.10 on a desktop machine. Disaster. Three key features failed: the display of curve objects (an object consisting of connected points), mouse picking of objects, and transparency.

Remembering what I thought I had seen in the past, I was able to get all three features working again on Snow Leopard, and curves on Ubuntu, merely by placing dummy references to the two new attributes in those vertex shaders that didn’t actually use those attributes. This seems brain-dead; there is no such requirement on my Windows machine. I guess the implication is that the GLSL compilers on Snow Leopard MacBook Pro and Linux have serious flaws.

I still have not managed to get mouse picking and transparency working on Linux. Presumably I’ll have to go through something like the extremely tedious experience I had before with Linux (and to a lesser extent with the Mac), which is to try every conceivable combination of GLSL statements in the forlorn hope of finding some set that happens to work. Stochastic programming is not my favorite line of work.

Bruce Sherwood

P.S. The new feature is pretty exciting. In your user program you can easily create your own mesh object and then use it just like the built-in mesh objects (box, cylinder, cone, sphere, pyramid, curve). You can see a three-triangle example here, written in CoffeeScript (you can use either JavaScript or CoffeeScript at glowscript.org), running under the test version 0.8dev:

http://www.glowscript.org/#/user/Bruce_ … ogram/Mesh

Note the interesting opacity fade to nothing. I did most of this development, but David Scherer showed me how to use the name of the newly created mesh as the name of the object that uses that mesh, something I wouldn’t have been able to figure out for myself.

I did a lot of work to ensure that I send to each shader only that data that is actually referenced and used by that shader, and that made mouse picking and transparency work on all platforms, Windows, Mac, and Ubuntu (all with NVIDIA drivers). Moral of the tale: don’t send anything from CPU to shaders that the shaders aren’t going to use.