View Full Version : Matrix Storage Conventions
ref_cracker
12-10-2006, 10:56 PM
Most code I see stores a matrix in memory like so.....
Legend:
ax = axis X
ay = axis Y
az = axis Z
t = translation
[ax.x ay.x az.x 0.0]
[ax.y ay.y az.y 0.0]
[ax.z ay.z az.z 0.0]
[t.x t.y t.z 1.0f]
am I crazy for wanting to store it in memory like this?
[ax.x ax.y ax.z t.x]
[ay.x ay.y ay.z t.y]
[az.x az.y az.z t.z]
[0.0 0.0 0.0 1.0f]
Can someone explain this to me? Seems to me like the second one is a better memory representation for doing SIMD optimizations because if you know the matrix is world space you just leave out the last vector4 and off you go. I wrote a shader for matrix skinning like 3 years ago when I loaded the registers I did so like this and off I went with dp4s. Maybe I'm missing something here and I just don't know enough about the lower level SIMD optimization stuff.
Thanks in advance!
Reedbeta
12-11-2006, 01:21 AM
It's all conventions. There are two different kinds of conventions at work here: first, whether you mathematically are thinking of vectors as column vectors or row vectors; in the first case (which OpenGL uses), you multiply a matrix by a vector and so your basis vectors go into the columns of the matrix, and in the second case (which D3D uses), you multiply a vector by a matrix and so your basis vectors go into the rows of the matrix. The second convention is whether you store your matrices in memory column by column and then row by row (row major order), or the other way around (column major order). OpenGL expects column major matrices and D3D expects row major ones, which has the curious effect that the memory layout of the various coefficients is just the same for both libraries, since there are two transpositions involved.
Anyway, if you use premultiplication with row major order OR postmultiplication with column major order, then the basis vectors come out coherent in memory and so does the [0, 0, 0, 1] vector. That may very well be why those two combinations of conventions are the ones chosen by D3D and OpenGL respectively.
ref_cracker
12-11-2006, 01:52 AM
Reedbeta,
Look more closely at the way I've laid out the structures.
In both diagrams the memory is laid out like so.
[ 0, 1, 2, 3]
[ 4, 5, 6, 7]
[ 8, 9,10,11]
[12,13,14,15]
I understand pretty much what you're explaining but in my examples I have t.x t.y t.z at locations 12,13,14 in example A and at locations 3,7,11 in Example B.
The confusion here stems from me only wanting to store 3 Vector4s for a world space matrix. In game code usually all the 4X4 matricies stay constant in their 0,0,0,1 for scaling, rotating, and translating. So why have the extra computations for the 0,0,0,1 components in game logic?
If your matrix was laid out as a 3 Vector4s (closely following example B's layout but dropping the last Vector4) you can apply the matrix to a point with 3 DP4s! And I wanted to see if I could preserve the ordering for the 4X4 implementation.
Good link describing what you were talking about.
http://www.mindcontrol.org/~hplus/graphics/matrix-layout.html
Reedbeta
12-11-2006, 09:42 AM
Yep, the memory layout you posted is correct for both libraries. The layout of slots in the matrix you posted is row-major ordering, correct for D3D, but in OpenGL it's in column major order:
[ 0, 4, 8,12]
[ 1, 5, 9,13]
[ 2, 6,10,14]
[ 3, 7,11,15]
As mentioned before though, OpenGL's matrices are transposed from D3D's, so the memory layout is still the same. :yes:
Actually, Direct3D specifies that the memory layout of the 4x4 matrix is in row major order, but the multiplication occurs between a row vector and a matrix column. This results in element fetches which are not in consecutive memory locations. I would imagine that drivers would store the transpose of the matrix internally so that the multiplication operation is more cache-friendly.
When loading matrices into vertex shader constant registers, the transpose of a D3D 4x4 matrix usually needs to be used instead, but isn't required (http://www.mvps.org/directx/articles/nontranspose.htm).
juhnu
12-11-2006, 06:26 PM
In game code usually all the 4X4 matricies stay constant in their 0,0,0,1 for scaling, rotating, and translating. So why have the extra computations for the 0,0,0,1 components in game logic?
For simplicity and due to fact you might very well need inverse transformations and projections, which need the extra space. If you haven't profiled that it's a real bottleneck in your application, it's probably not worth the effort to optimize this.
vBulletin, Copyright ©2000-2010, Jelsoft Enterprises Ltd.