PDA

View Full Version : A few questions about shaders


starstutter
04-10-2008, 08:00 AM
I actually have quite a few questions about shaders concerning preformance and quality. Feel free to answer any one you know. Much appreciated in advance, thanks:

1. in shaders, does packing things into seperate functions degrade speed at all? It's in Effect files if that makes any difference ie:

shader render()
{
do everything;
}

vs.-----

shader render()
{
apply cube map
do normal mapping
ect ect
}


2. is it better to try and do everything in one pass to get the same quality compared to doing multiple passes? ie: if you do normal mapping and cube mapping in the same shader, you can get bumpmapped environment mapping in one pass vs. having to do the normal mapping twice. I really don't know which is faster to be honest. What about in practical situations (ie a large scale scene with many different materials)

3. I can't seem to do dynamic branching correctly. It always does both paths of code no matter what I do (even the earliest rejection possible). could this be a driver problem, card problem, or am I missing something? I'm using normal 'If' statements and conditions set by the application.

starstutter
04-10-2008, 08:18 AM
oh and by the way, to clear up the single vs multipass issue, it's using SM 3 and I have tried implementing both using multiple lights, all using normal mapping.

Strangley enough, it seemed to actually go faster when I rendered all the normal mapping 3 times over rather than once with multiple lights in the same shader.

Kenneth Gorking
04-10-2008, 04:08 PM
1. Functions are always inlined. I think there has been some talk about allowing actual 'call' opcodes for GPU's, but I do not know if they exist yet.

2. It pretty much depens on what you are doing, but for the example you described, the first option will be faster because it only results in one texture lookup to get the normal. Just keep in mind, that you want to keep your shaders as simple as possible while still getting the job done.

For large scenes, you can do a depth-first pass, which just lies down the depth. Then in the following passes you can execute whatever shader you want, and because of the depth-first pass, only pixels that are visible will be evaluated.

Another options is to use one shader for many materials. I remember an old nVidia sample that did 2 different BRDFs on a statue in the same shader in one pass.

In the end, I would recommend you to experiment with it.

3. This is most likely a compiler optimization to avoid excessive branching in small 'if'-blocks. Branching in shader code is costly, because the GPU processes multiple pixels at once. If just one of these takes the 'if' branch, all the other processors has to wait for it to finish before they can continue.

You should only use branching for large 'if'-blocks.

starstutter
04-10-2008, 05:53 PM
thanks for the response.


1. Functions are always inlined. I think there has been some talk about allowing actual 'call' opcodes for GPU's, but I do not know if they exist yet.

Yeah, I kind of worded that funny. What I meant was...



OUTPUT CameraPS( INPUT IN )
{
OUTPUT OUT;

OUT.color = find_color(input stuff goes here) ;

return OUT;
}


float4 find_color(input stuff)
{
do calculations;
return calculations;
}



That's a more literal translation of what I meant. It never goes out of the shader, it just goes out of what would be considered Main().



2. It pretty much depens on what you are doing, but for the example you described, the first option will be faster because it only results in one texture lookup to get the normal. Just keep in mind, that you want to keep your shaders as simple as possible while still getting the job done.

I was really more wondering about the multi vs single pass lighting. I experimented a while back with a fairly small scene. One method used 3 lights in the same pixel shader making no repeat calculations. The other method redrew the polygons 3 times. The visual results were the same, but the multipass version ran almost 3x faster. I'm just convinced I'm doing something wrong with that. It doesn't make a bit of sense.


For large scenes, you can do a depth-first pass, which just lies down the depth. Then in the following passes you can execute whatever shader you want, and because of the depth-first pass, only pixels that are visible will be evaluated.

wait... do you mean rendering the depth (z-write) to the main render target (or back buffer I guess), then just drawing over it with subsequent passes? There's no shader work involved right? That's pretty clever :D


3. Branching in shader code is costly, because the GPU processes multiple pixels at once. If just one of these takes the 'if' branch, all the other processors has to wait for it to finish before they can continue.

but that's the thing, there's no way that any are supposed to execute the compicated path at all. The way I disable it is:



bool complex;

OUTPUT CameraPS( INPUT IN )
{
OUTPUT OUT;

if (complex == true)
{
do complex things
}
do regular things;

return OUT;
}




and I enable/disable it through the application via Effect->SetBool() and change it per object. I thought that may have been the problem but I left it off totally and no different results. :(

Reedbeta
04-10-2008, 09:33 PM
It never goes out of the shader, it just goes out of what would be considered Main().

I think Kenneth understood what you meant. What he's saying is that there is not really any such thing as a function call on a GPU. In your example, the shader compiler would just insert the code for find_color directly into CameraPS at the point where find_color is called. So, there's no performance penalty for splitting things into their own functions, but nor is there any benefit in compiled code size for doing so.

wait... do you mean rendering the depth (z-write) to the main render target (or back buffer I guess), then just drawing over it with subsequent passes? There's no shader work involved right? That's pretty clever :D

No shader work involved in the z-only pass. In fact, I've heard that many modern GPUs can write 2 pixels per clock in z-only mode, so it's really, really fast.


The way I disable it is: ... and I enable/disable it through the application via Effect->SetBool() and change it per object.

I think what you want in this case is for the shader compiler to actually create two different versions of the shader, one for complex = true and one for complex = false. You probably don't want this to be implemented as an actual branch in the pixel shader since there is a cost associated with that, even if it is highly coherent in screen space (due to being set on a per-object basis). I'm not sure exactly what shader language this is, but if it's Cg, I know you can create an effect file that has two different techniques, one of which compiles the shader with the boolean on and the other with the boolean off. Then you can switch between the techniques in the application. I'm not sure if it will do the same thing if you have a global boolean that you set like a parameter.

Kenneth Gorking
04-10-2008, 10:08 PM
I think what you want in this case is for the shader compiler to actually create two different versions of the shader, one for complex = true and one for complex = false. You probably don't want this to be implemented as an actual branch in the pixel shader since there is a cost associated with that, even if it is highly coherent in screen space (due to being set on a per-object basis). I'm not sure exactly what shader language this is, but if it's Cg, I know you can create an effect file that has two different techniques, one of which compiles the shader with the boolean on and the other with the boolean off. Then you can switch between the techniques in the application. I'm not sure if it will do the same thing if you have a global boolean that you set like a parameter.
It is possible with a global parameter, you just need to mark it as 'literal' (ie. compile-time constant) using cgSetParameterVariability(paramComplex, CG_LITERAL). Just make sure you mark it after you have loaded the shader and before you compile it.
For example:

load shader
mark 'complex' literal
set 'complex' = true
compile shader 1
set 'complex' = false
compile shader 2


That should produce 2 different shaders.

JarkkoL
04-12-2008, 09:36 AM
starstutter, if you set the bool for branching per object, you are doing static branching, not dynamic branching. Dynamic branching means that you evaluate a branch per pixel/vertex. If you do static branching, driver can generate optimized version of the shader run-time where non-taken branches are completely eliminated. This essentially results in same performance as if you had written different versions of the shader yourself and would set the proper shader before draw call. It's up to drivers though how static branching is implemented and if you get any performance hit from using them vs writing specialized shaders.

starstutter
04-13-2008, 08:55 AM
It is possible with a global parameter, you just need to mark it as 'literal' (ie. compile-time constant) using cgSetParameterVariability(paramComplex, CG_LITERAL).
load shader
mark 'complex' literal
set 'complex' = true
compile shader 1



hmmm, do you know of an HLSL equivelant?

also, can it *really* be switched between each object, or is it between each material switch? Like:


complex = true;
shader->begin();
draw_objects();
shader->end();


or can it really be...


shader->begin();
complex = true;
draw_some_objects();
complex = false;
draw_other_objects();
shader->end();


and the second option would have the same speed benefits?