New GL spec means: new begging.

Just to be the first one to beg, now that GL 4.0 and 3.3 specs are out, just would like to add the requests:

  1. separate shader objects (especially since there are now so many shaders running around)

  2. DSA (or atleast a subset of DSA for manipulating textures and buffer objects)

  3. GL_NV_texture_barrier

  4. limited blending for integer texture targets

  • Agreed especially for DSA and texture barrier. Number 1!

  • A debug profile!

  • I think with the step up from OpenGL 3.3, separate shader object could be fine now.

  • include, shader binaries.

More begging to come.

To join the chorus:

  • DSA for all non-deprecated functions. I don’t care about using DSA on all functions from 1.0 to 4.0. Cut down the spec and only include core 3.3+/4.0+ functions to the spec.

  • Shader binaries.

  • Debug profile.

And that’s it, more or less.

  • Multi-threaded rendering, i.e. command buffers, or something like that.
  • Environment (global) uniforms for GLSL.

Do you need more than UBO’s for that?

It would be nice to have that on older hardware, because UBOs aren’t supported everywhere. Also it would be nice to be able to make just any single uniform an env-variable, without the need to create a struct for everything. That makes it far easier to have people plug in data through scripts and such, which the core engine does not know anything about.

For example, if my game-play requires a “time-of-day” variable, a level-script could just set that variable and all depending shaders would get that value, without further ado.

Jan.

  • Generalizing the GL_ARB_explicit_attrib_location extension for uniform variable (GL_ARB_explicit_uniform_location) and what we used to call varying variables in GL_EXT_separate_shader_objects. This would fix the only trouble I have with this extension.

  • Allowing blocks for vertex shader inputs and fragment shader outputs and allow to assign a location to these blocks.

Some thoughts:

  1. explicit uniform location… that is going to be messy once arrays, matrices and such come into play as they usually take more than one slot… also, what keeps going through my head: if you have lots of common uniforms to many shaders, wasn’t UBO’s supposed to handle that? Can you give the use case for it?

  2. I am thinking that binary shaders is a bit of a red-herring… over in GLES2 land there are binary shaders in some form on some platforms… the Tegra implementation of binary shaders is particularly painful as the binary shader also depends on GL state (blending, masking, etc)… packing up all the possibilities likely would make the blob too big… but over on driver land, the driver just saves those compile jobs it needs… maybe something less “final” than binary shaders, a pre-compiled blob that is not complete but has done the bulk of the work? Admittedly, for binary shaders to actually work well, somehow the shader source code will need to be in there too for an old application to work on new hardware of years from now… or for that matter when the GLSL compiler is fixed/improved.

explicit location is a door for a really nice implementation of SEMENTICS. No more GetUniformLocation / BindUniformLocation, GetAttribLocation / BindAttribLocation … let’s say: less of those. I agree than with the multiple location per variable … there is an issue. With block and maybe sizeof(block), it could be solved.

About binary shaders, the idea is not to ship binary shaders, but offer a mechanism where an application can sends a bunch of GLSL shaders in source code form to the driver, and get back the compiled blobs to reuse at a later time, to reduce application start time. Of course as soon as the driver or hardware changes, the binary blob should be refused, and the app will have to refeed the original GLSL source code.

There was pretty interesting discussions and uses cases about this on these forums, I will try to dig a link.

Point 2 on this post

This post too

You found 2 (two!) posts about that topic? Weren’t there like a gazillion threads about “binary shader blobs” until now. And every two weeks someone needs to explain the idea again, and again, and again…

Anyway, i’d like to have that too.

Jan.

Well I found 2, you found none … :slight_smile:

Giggles, “Point 2 of this post”, that post was mine.

At any rate, I have seen the binary shader thing thrown around ALOT. What goes through my head is the idea of a binary blob as a hint to be used optionally… now the horrible icky things soon come into play: deployment. Chances are one will need a binary blob for each major generation for each major GPU architecture. On PC, that means right now means 6 blobs (GL2-cards, GL3-cards, GL4-cards)x(nVidia or AMD). [I don’t event consider Intel anymore at this point]. But the plot thickens, OS and driver version.

One can go for this: application does not ship those blobs but rather makes them at first run and then re-uses them so we get things like:

glHintedCompileShader(const char *GLSLcode, int blob_length, GLubyte *binary_blob);

and maybe a query or something:

glGetInteger(GL_USED_BINARY_BLOB, &return_value);

and lastly glGetBinaryBlob.

it is feasible to do, and in most cases only the first run of the application will get a “full” compile… the sticky bit of the idea above is that it assumes that the “binary blob” does not depend at all on GL state (who knows what the driver does as some GL state changes, see my comments on Tegra)…

But the idea of shipping static binary blobs to cut down on startup time I don’t see be so feasible with constantly evolving hardware and drivers… it is kind of feasible in the embedded world, but only with an incredible amount of care.

Oh god! And I wasn’t aware of the binary blob idea!

In my mind it was way more simple: At first launch, you build the shader source, get and save the binary and load it directly at next launch. Binary would even be compatible only for a specific driver version. A function to check binary validity would be use to check if a shader rebuild is required.

it is kind of feasible in the embedded world, but only with an incredible amount of care.

That kind of thing works in the embedded world because you have near total control over the system you’re running on. You get to say what your chipset is, what driver version of that chipset is, etc. So shipping binary blobs is fine.

BBs for desktop OpenGL should not be interchangeable. We want them to solve a specific problem: shader compilation startup time. The best way to do this is to just have them generate a block of data that may be able to be loaded later as a program.

In any case, BBs are secondary. We now have five different shader stages; separation of programs into stages is becoming very necessary, if you want to do anything with geometry or the two tessellation stages. And we know that separation isn’t a hardware issue; it’s 100% just a bad decision made by the ARB 5 years ago.

Fix it!

Well, who can argue with a polite request like that.

Actually both ideas have been suggested by different people and are listed separately on the User Wish List

GLSL shader precompilation
Description: Having an off-line tool that compiles GLSL into something that you can feed into any implementation to create programs.
This something would likely be a form of the ARB assembly, PTX or LLVM .
Benefit: Presumably, the compile/link time of this precompiled format would be lower than that of GLSL itself.
Some people also want this to make it harder for people to copy their GLSL code.

Compiled Shader Caching
Description: The ability to store compiled shaders in some format, so that subsequent executions of the programs will not require a full compile/link step. Or at least, will not require it unless drivers have changed.
Benefit: Improve program initialization/level loading (if shaders are part of level data).

The most comprehensive discussion on binary blobs started here:
http://www.opengl.org/discussion_boards/ubbthreads.php?ubb=showflat&Number=244045#Post244045

There is another way to cut down on shader startup time, and several folks have already commented that they do it:

Use nVidia Cg compiler with -oglsl to get GL-asm.

Naturally, this is not going to fly too well when you want to use more advanced features on a non-nVidia card as GL-asm is not even part of the GL spec, and the asm extension itself has not been updated (but nVidia has made lots of NV_ extensions for it).

A potentially more middle ground might be:

  1. update GL-asm interface parallel to GL features
  2. allow for application to get GL-asm code
  3. ability to send GL-asm instead of GLSL to GL for shaders

Though this is not so great either as making the GL-asm spec is not going to be fun (and we would find that each new GL release has 4 docs: GL core, GL compatibility, GLSL and (new)GL-asm). Worse, epic chance that driver has to apply some magic to the GL-asm too and what is good GL-asm might heavily depend on the GPU.

The idea of sending some intermediate byte code is appealing though, but what the byte code needs to store might also depend on GPU architecture… and even then the driver will need to “compile” the byte code… one can argue that the GL-asm idea above is just one form of this too.

Or like someone mentioned long ago: the driver could simply store a 50MB hashtable of glsl program srccode and its binary; no extra specs necessary. Internal binary format for the given driver, update of driver invalidates that cache.