PDA

View Full Version : multithreading for multicore


rouncer
11-05-2008, 08:23 PM
im trying to get my dual core to render twice the speed, and im using the following code to do it->



HANDLE handle2;
DWORD id2;

handle2=CreateThread(NULL,0,render_scene2,(LPVOID) &scenestruct,0,&id2);
render_scene2()

WaitForSingleObject(handle2, INFINITE);



render_scene2 clips and renders a clipped portion of the screen (its split in half) im rendering one on a thread, and one in the main process.
but im not getting much speed increase! ive used this method for calculating normals and doing some audio work and it worked fine, but it actually seems to be going slower than just calling the procedure twice in the main process.

am i doing it wrong?

Reedbeta
11-05-2008, 10:05 PM
If your software renderer is vertex bound rather than pixel bound, then you might not see much speedup if both halves have to process essentially the same vertices.

I know nVidia SLI technology works by splitting the screen, but then dynamically adjusting the splitting position to balance the load between the two GPUs. You could try something similar.

I don't know much about software rendering so maybe I'm talking out of my ass here...but perhaps you could have one thread consuming the scene description and generating a list of transformed-and-lit triangles, and another thread consuming the triangles and generating the image. Something like a GPU's pipelined vertex and fragment processing. This way could get you better cache locality too, maybe (if each core has its own cache).

SmokingRope
11-05-2008, 10:13 PM
You may be throwing more draw instructions at the graphics card in a shorter period of time, but this doesn't mean the GPU can process them any faster than it could with a single thread. It would seem like this could wreak havoc on your cache too.

rouncer
11-05-2008, 11:51 PM
You may be throwing more draw instructions at the graphics card in a shorter period of time, but this doesn't mean the GPU can process them any faster than it could with a single thread. It would seem like this could wreak havoc on your cache too.

im writing from scratch using the cpu's. im not using a video card.

ill see if i can come up with some more ideas myself.

Nick
11-06-2008, 01:23 AM
It looks like you're creating a new thread every time you render the scene and wait for it to terminate itself? Don't do that. Creating threads and terminating them costs a significant amount of time (it involves the O.S. having to set up a stack, initialize registers, perform some thread scheduling setup, etc).

You'll get far better results by creating the thread once and just letting it process tasks in an infinite loop. When there is no work, suspend the thread until new tasks arrive. The basic code could look something like this:

HANDLE secondary_thread;
HANDLE scene_ready; // Event for notifying that the scene is ready to be rendered
volatile bool notify = false; // Flag for notifying that the secondary thread is done rendering

unsigned long __stdcall thread_routine(void *parameters)
{
while(true)
{
WaitForSingleObject(scene_ready, INFINITE);
render_scene(SECOND); // Render the second half of the scene
notify = true;
}

return 0;
}

void main_loop()
{
scene_ready = CreateEvent(0, FALSE, FALSE, 0);
secondary_thread = CreateThread(0, 0, thread_routine, 0, 0, 0);

while(not_exiting)
{
prepare_scene();
SetEvent(scene_ready); // Let the secondary thread render the second half of the scene
render_scene(FIRST); // Render the first half of the scene in the main thread
while(!notify) {}; // Wait for the secondardy thread to finish
notify = false;
}

TerminateThread(secondary_thread, 0); // Forcibly stop the secondary thread
CloseHandle(scene_ready);
CloseHandle(secondary_thread);
}

Note however that this doesn't scale well beyond two cores. You always have to wait for the slowest thread, and as Reedbeta noted you do the vertex processing again on each thread.

Architecting your code to scale well to a large number of cores is a complicated topic and still under much research. I highly recommend getting a good book on muti-core programming (http://www.amazon.com/Art-Multiprocessor-Programming-Maurice-Herlihy/dp/0123705916) before you enter that tricky territory...

rouncer
11-09-2008, 07:17 PM
hey thanks nick, ill try it out what you said.

rouncer
11-12-2008, 05:21 PM
yep. your idea worked.

i got a single quad to render exactly double the speed.

of course your right ive got more work ahead of me if i want
it double the speed when its actually rendering a real scene.

i can dual core the vertex matrix transforms easy enough, but the
clipper is still a problem, if i can somehow clip the whole thing at once
with all the cpus ive probably beaten amdahls law.

thanks alot, (set event!)

alphadog
11-13-2008, 09:00 AM
As Nick says, instead of creating an endless stream of threads, use and reuse a pool of them. For a quad-core, create a pool of four threads. (You can try more, but likely to no benefit.)

http://en.wikipedia.org/wiki/Thread_pool_pattern

Nick
11-13-2008, 01:28 PM
http://en.wikipedia.org/wiki/Thread_pool_pattern
Oh dear, they call that a pattern now as well? :surrender

alphadog
11-14-2008, 10:42 AM
Oh dear, they call that a pattern now as well? :surrender

Yes, and if you're not part of the pattern (collection), you're part of the anti-pattern (collection)... ;)

http://en.wikipedia.org/wiki/Anti-pattern