PDA

View Full Version : Vertex Size


idreamlovey
08-17-2007, 10:19 PM
I have three types of vertex data as given below:-

struct D3DVERTEX
{
D3DXVECTOR3 p; //12 bytes
D3DXVECTOR3 n; //12 bytes
FLOAT tu, tv; //8 bytes
};//32 Byte stride
struct D3DVERTEX2T
{
D3DXVECTOR3 p; //12 bytes
D3DXVECTOR3 n; //12 bytes
FLOAT tu, tv; //8 bytes
FLOAT tu2, tv2; //8 bytes
FLOAT tu3, tv3; //8 bytes unnecessary for padding to multiple of 64 bytes

FLOAT tu4, tv4; //8 bytes unnecessary for padding to multiple of 64 bytes

FLOAT tu5, tv5; //8 bytes unnecessary for padding to multiple of 64 bytes
};//64 byte stride
struct D3DVERTEX3C
{
D3DXVECTOR3 p; //12 bytes
D3DCOLOR Color; //4 bytes
FLOAT tu, tv; //8 bytes
FLOAT tu2, tv2; //8 bytes unnecessary for padding purpose to multiple of 32 bytes
};//32 byte stride
My question is shall i use three VB for each above vertex types or one VB of given vertex type:-

struct D3DVERTEX
{
D3DXVECTOR3 p; //12 bytes
D3DXVECTOR3 n; //12 bytes
D3DCOLOR Color; //4 bytes
FLOAT tu, tv; //8 bytes
FLOAT tu2, tv2; //8 bytes
FLOAT tu3, tv3; //8 bytes unnecessary for padding to multiple of 64 bytes

FLOAT tu4, tv4; //8 bytes unnecessary for padding to multiple of 64 bytes

D3DCOLOR Color2; //4 bytes unnecessary for padding to multiple of 64 bytes
};//64 byte stride

Here in second case u can clearly see the unnecssary memory consumption. In my first case the wastage of memory is limited as the padded data is not much say i have 1MB D3DVERTEX, 200-300kB D3DVERTEX2T and 100-200kB D3DVERTEX3C. But i have 3 VB switches. And in Second case, no VB switches but memory overhead and it simplifies the coding looks but it increases the amount of data to be transfered.
1.Which one u think should be good...or is there any other method to make the things better?...
2.Also i heard of multiple stream, what should i do to make in two streams without any conflictions?...

Reedbeta
08-17-2007, 10:46 PM
I would say using three vertex streams would be better than wasting memory. A bit of increased code complexity is a small price to pay for making the CPU-to-GPU memory transfer as small as possible.

I don't know personally how to set up multiple streams in D3D, though I'm sure someone else will reply about that. But you might be able to avoid the switch() statements and write the code more cleanly using template tricks.

idreamlovey
08-18-2007, 03:09 AM
Thanks Reedbeta... i am thinking of it too...I read about the multistream in SDK documenation....and again i found my self on the drowning boat...As i come to know about the different state changes costs that are as follows:-

most-> least expensive
API Call Average number of Cycles
SetVertexDeclaration 6500 - 11250
SetFVF 6400 - 11200
SetVertexShader 3000 - 12100
SetPixelShader 6300 - 7000
SPECULARENABLE 1900 - 11200
SetRenderTarget 6000 - 6250
SetPixelShaderConstant (1 Constant) 1500 - 9000
NORMALIZENORMALS 2200 - 8100
LightEnable 1300 - 9000
SetStreamSource 3700 - 5800
LIGHTING 1700 - 7500
DIFFUSEMATERIALSOURCE 900 - 8300
AMBIENTMATERIALSOURCE 900 - 8200
COLORVERTEX 800 - 7800
SetLight 2200 - 5100
SetTransform 3200 - 3750
SetIndices 900 - 5600

and so on....
Now in my case it is clear that
Using multiple stream, i will improve the talking between GPU and CPU...But i have to do the additional SetVertexDeclaration 3 times and SetStreamSource 6 times...what do u say on it ?

kusma
08-18-2007, 06:53 AM
Don't confuse function call overhead with state change overhead. Many GPUs have a fixed state-change overhead when you change whatever state is needed to regenerate an internal pixel shader for instance. D3D does buffering of state-changes and gives them to the driver in a batch. It is of course very dependent on the GPU/driver-architecture how it affects performance.

idreamlovey
08-18-2007, 07:04 AM
So Kusma , Your thumbs up to the multistream to save memory bandwidth...or shall i stick with the 3 vbs as i described my first case and reedbeta suggested....

kusma
08-18-2007, 07:32 AM
idreamlovey: Not necessarily, different GPUs have different caching characteristics, so it's difficult to tell. My first instincts tell me to try to avoid having padding bytes, but I'm not sure how well different vertex processors would handle this. I guess this is something you should profile before you make a decision.

idreamlovey
08-18-2007, 10:39 AM
As i gone through the ATI optimization papers, they suggested that to do whatever to keep the vertex stream few as possible. It may causes the lower performance beacause of some unpredictable things, so i will go with my first choice and see what happens...Anyway thank u all for ur kind replies....

idreamlovey
08-19-2007, 09:11 PM
Here is the another solution i found that is i will use one VB but not specify its FVF at the time of its creation. I will dump the 32 bytes static data as usual i did before...At the time of rendering, i set its FVF according the material...Doing this i keep my code simple and made a jump of 10 fps extra while rendering 56k triangles with all effects....