Right now I’m using the fixed-function pipeline to do my projective texturing (planar)…
I’m using the following to setup my texture matrix and it works well.
glMatrixMode (GL_TEXTURE);
/*
glLoadIdentity ();
glTranslatef (0.5f, 0.5f, 0.0f);
glScalef (0.5f, 0.5f, 1.0f);
*/
glLoadMatrixf (mEnvBase.m); /* Result of the above logic is a constant matrix... */
glMultMatrixf (mEnvProj.m);
glMultMatrixf (mEnvModel.m);
glMatrixMode (GL_MODELVIEW);
My question is, what’s the most efficient approach to doing the same thing
in a vertex program?
Should I just keep the code as it is and multiply the texture coordinates by
the appropriate texture matrix in the vertex program? I don’t want to pass
the vertex program the projection*modelview matrix (see below) for performance
reasons.
Cg provides a special projective texture lookup intrinsic for fragment programs,
that as I understand it, divides the texture matrix by its last component (equiv.
to the glTranslate/glScale logic above?). I’m assuming there’s some advantage to
doing it in a fragment program that I’m just too stupid to see? It seems to me
that all of the work could be done per-vertex and you’d save a few GPU cycles
per-fragment… Am I missing something here?
I also noticed (to my surprise) that pre-multiplying the Env Projection /
Modelview matrices on the CPU every time the projector’s view changes (I have
a latency of 4+ frames, to keep the pixel buffer from eating too much fillrate)
is actually 10-15% slower than doing the projection and modelview mults EVERY
FRAME w/ glMultMatrixf.
The following, which logically seems like an optimization turns out slower than
the original code:
[Update Pixel Buffer (once every 4+ frames)]
--------------------------------------------
...
const l3dMatrix& mEnvProj = world.camera.projection ();
const l3dMatrix& mEnvModel = world.camera.modelview ();
mEnvProjView = mEnvProj * mEnvModel;
...
[Project Pixel Buffer's Texture (every frame)]
----------------------------------------------
...
glLoadMatrixf (mEnvBase.m);
glMultMatrixf (mEnvProjView.m);
glMatrixMode (GL_MODELVIEW);
...
Lesson learned here:
- Don’t assume fewer operations per-frame/update are always “better”
(The driver can take advantage of SIMD/MMX instruction sets and other optimizations
that unless you have a REALLY nice math library, could prove orders of magnitude
faster than your own code…)
Figured I’d share that little bit since it really surprised me