Matrix multiplication question

Ok, so lately I have been writing a simple rendering engine for iOS using OpenGL es 2.0 spec. To do anything moderately exciting requires adding back in the model, view and projection matricies.

One of the most common operations a graphics engine performs is 4x4 matrix multiplication, which in itself performed many times in a given frame is quite computationally expensive:

matrix[0]  = m1[0]*m2[0]  +  m1[1]*m2[4]  + m1[2]*m2[8]   + m1[3]*m2[12];
matrix[1]  = m1[0]*m2[1]  +  m1[1]*m2[5]  + m1[2]*m2[9]   + m1[3]*m2[13];
matrix[2]  = m1[0]*m2[2]  +  m1[1]*m2[6]  + m1[2]*m2[10]  + m1[3]*m2[14];
…

matrix[15] = m1[12]*m2[3] +  m1[13]*m2[7] + m1[14]*m2[11] + m1[15]*m2[15];

Now for the sake of optimisation on the CPU side, I could setup a matrix multiplication daemon with threading to split the load.

However, the shader language gives the inbuilt matrix primitive and operations.

uniform mat4 m_model;
uniform mat4 m_view;
uniform mat4 m_projection;

gl_Position = m_projection * m_view * m_model * v_position;

Now my question is, is the matrix multiplication as expressed in the shader language (and presumable executed on the GPU) optimised? Does the matrix multiplication happen serially or in parallel? Is it better to send a precomputed on the CPU model view projection matrix to the vertex shader or is what I am doing here ok?

The difference between doing the computation on the CPU vs the vertex shader is that computations done on the vertex shader are done per vertex. If you have a model with 10000 vertices, it’s almost always better to compute a combined model-view-projection matrix on the CPU, since it’s 10000 times fewer computations (even though GPUs are very efficient at arithmetic). If your models have only 4 vertices, it’s going to make very little difference - the bottleneck will be the draw call overheads rather than arithmetic.

Another tip is that if you are applying several matrices to a vector in a shader, use matrix-vector multiplications. Your example is equivalent to


gl_Position = ((m_projection * m_view) * m_model) * v_position;

at 64 + 64 + 16 = 144 multiplications, but


gl_Position = m_projection * (m_view * (m_model * v_position));

needs only 16 + 16 + 16 = 48 multiplications. A smart compiler might make that transformation for you, but it’s better to be on the safe side.

This topic was automatically closed 183 days after the last reply. New replies are no longer allowed.