DevMaster.net  
Home | Forums | 3D Engines Database | Wiki | Articles/Tutorials | Game Dev Jobs | IRC Chat Network | Contact Us

Graphics Theory & Implementation > General


Speed-up & Optimization Techniques      
26/11/2003

Introduction

On this page I'll collect together various tricks I find about speeding up 3D engines. I'll present some of the more obvious ones first, because some people may have missed them, and then move on to some of the finer points. If you have any other speed up tricks/algorithms, feel free to send them to me.

Float --> int conversion

Always worth experimenting with, as this can often be slower. 2 methods I know of are using a #pragma, and an in-line function. Firstly, the #pragma:

#pragma aux RoundToInt=\
        "fistp DWORD [eax]"\
        parm nomemory [eax] [8087]\
        modify exact [8087];

Works very nicely, a WDISASM shows that the compiler drops the call to __CHP, and just in-lines the conversion. There is another method, which doesn't use #pragmas, so it should be portable to other compilers. This was pointed out by "InnerSect":

#define FIST_MAGIC ((((65536.0 * 65536.0 * 16)+(65536.0 * 0.5))* 65536.0))

int32 QuickFist(float inval)
{
	double dtemp = FIST_MAGIC + inval;
	return ((*(int32 *)&dtemp) - 0x80000000);
}

He did some benchmarks, if I remember rightly, he found the above to be faster, which is interesting, to say the least.

Pre-calculation

Pre-calculate virtually everything! You can't pre-calculate too much. However don't go overboard to the point where everything is table-driven and you don't have any freedom. Always keep an eye to versatility. First things to pre-calculate are your normals. Surface normals and vertex normals. Very important for debugging - these things can take ages to calculate. Once you have these pre-calculated, you only need to rotate them as you do the rest of your object. Do NOT translate or scale them though. They need to stay at a unit length of 1. If you do happen to scale them though, then multiply them by inverse scale; e.g.:

				| Sx  0  0  0 |
	Transform Matrix =	|  0 Sy  0  0 |
				|  0  0 Sz  0 |
				|  0  0  0  0 |

	Then you will need to do:

		NormalX *= 1 / Sx
		NormalY *= 1 / Sy
		NormalZ *= 1 / Sz

Also you don't need to transform your normals when you perform culling or lighting. Simply back-transform the other vector you are considering, and avoid a matrix/vector mul. You back-transform simply by transposing the upper 3x3 matrix:

	For every row
		For every column
			output[row][column] = input[column][row]
		End
	End

Transform the light, viewing vector, whatever against this matrix, and you've saved a lot of work. You them simply need to do the usual calculations, with the new vector, and the old normal.

Reciprocals

If you're going to do multiple divides by the same number, take the reciprocal of the number, and multiply by it. The reciprocal is simply 1/n, where n is your number. If you're really clever, you'll find something to do while the fdiv executes. A nice little trick is to warm the cache up; a cache miss can take quite a while to get the value loaded in from memory etc. So, while you're sitting there waiting for your fdiv to complete, why not read from a bunch of memory addresses, so they are ready in the cache? Seems to work well for me, inside my screen projection code. You can apply this technique well to a perspective texture mapper. It also helps for your perspective transform. And to convert back to the number given the reciprocal, do another 1/n. Extremely nice. If you're really adventurous, you could even try storing data as reciprocals, such as Z. I haven't tried this one though - it could get very hard to debug.

Vertex Tagging

Given a large list of vertices to handle, you're going to find that a lot - say half - are not visible, out of the frustum, whatever. You need to discard those quickly. An easy way of doing this is to add a 'visible' flag to the vertices.

Before you enter your processing loops, during setup, set this flag to false. Then, as you process your loops, if you find the vertex is needed, set it to true. For example if you find a triangle is visible, set all of its vertices visible flags to true. Then later on, you can simply ignore or discard these values if the flag is not set. This comes in handy for complex models, with a lot of vertices to project. Equally handy on large scenes, where a lot of geometry is not visible, and you can skip projection. If you apply this to lighting, you have to be careful though. If you clip to the view-volume, and you haven't calculated lighting, then you may find that around the edges, your routine goes crackers. You're trying to find the intersection of a defined and an undefined value. This needs a little thought...

An alternative to tagging can be done using a counter. The idea is simple. At initialization, you set a counter in the vertices to be some illegal value, such as -1 (0xFFFFFFFF). You also set another value, the frame counter, to zero (0). Then, as you process your object, if you notice that a face/vertex is to be used, you set its counter equal to the frame counter. If not, nothing is set or cleared. Then, when it comes to processing, you compare the counter to the frame counter. Those vertices/ faces that have been tagged with the current frame counter will be used, those which have not will contain some undefined value (less than current frame counter though) and will be ignored. This becomes handy in large data sets, where clearing a flag every frame can get expensive, if not done correctly.

Pointers

Pointers are very handy things. You can use them to make nice data structures, like linked lists, binary trees etc. You can also use them to address data very quickly. Say you have an array of items, each item is 27 bytes in size. You could pad that item to 32-bytes, and use a shift to calculate the address. But, you'd waste 5 bytes of memory for every array item. That'll soon add up! So, use pointers to address it. Say that 27 byte structure was your vertex. In your triangle structure, don't have int vertindex[3], have vertex *vertptr[3]. Then simply load in the pointer, address the data, and you're away.

Triangles vs Polygons

Tricky. Triangles are very nice to deal with, fast to render. But if you have say a set of 6 co-planar triangles forming a surface, and you could easily use a polygon instead, then you are doing 6 times as much work with a triangle renderer. But the triangles will be faster to render. Personally, I prefer triangles, much easier to work in a pure triangle based environment. Polygons do have their advantages, it may be worth playing with them. And if anyone out there has an algorithm to do constant delta texturing on convex polygons, I'd be very interested.

Overdraw / Underdraw

Overdraw is a speed-killer. Especially with complex rendering code. You're painting, re-painting, re-re-painting, re-re-re-painting, and so on. Losing time, losing speed. An easy way to eliminate overdraw is to front-back sort your triangles, and feed them into a Z-Buffer/S-Buffer. This causes problems though. A fully obscured triangle rendered through a Z-Buffer will be a complete waste of time. Scanning all those pixels, with nothing to render! Likewise S-Buffers suffer from complex segment insertion code, which may take its toll. For large triangles S-Buffer seems to work nicely. But, for many small triangles, its not so hot; for example, an sbuffer algorithm I made took ~30 seconds to render a ~3k triangle head, with just lambert shading. Clearly the volume of triangles overloaded the segment insertion routine. I think the solution here is to develop some efficient occlusion and VSD algorithms. A great many possibilities for work to be done here in the future.

Another problem may be "underdraw" (my little term). This is when you spend too much time processing offscreen polygons. Some helping factors here may be the fact that typically the visible polygons in a scene doesn't change dramatically. So for example in a first-person shooter, you could perhaps calculate the visible polygon set, then use the player direction to find some polygons to 'keep an eye on', making them critical if the player gets too near. Bounding volumes such as spheres/boxes help here. Hierarchial modelling may help here too, in classifying which portions of a model you can throw away.

Summary

So, heres a small set of rules to summarise how to make your engine faster:

  1. Precalculate as much as possible
  2. Don't calculate anything you don't need
  3. Don't calculate anything twice
  4. Try to use results of previous calculations again, if possible
  5. Look out for occaisions where you can 'spend a penny to save a pound'
  6. Before you code that function in assembly, ask yourself "Have I really got the best solution?"
  7. Experiment!
  8. Take risks
  9. Don't write anything off because it "looks slow"
  10. Explore your target architecture, know its quirks



Discuss this article in the forums
Print article

© 2003-2004 DevMaster.net. All Rights Reserved. Terms of Use & Privacy Policy Want to write for us? Click here