kwhatmough
08-01-2008, 03:53 PM
I need a fast way to compute sqrt(x*x + y*y + z*z) that avoids intermediate overflow and underflow.
I know that most runtime libraries (C,C++,Java) provide a 2d hypot() function that does this for 2 variables, _AND_ I realize that I can achieve my result using 2 calls to it as follows:
hypot( hypot(x, y), z)
but that seems uneccessarily slow. Is there a faster way that still avoids intermediate overflow and underflow? I am _NOT_ looking for a fast sqrt() function here. I am well aware of the _NUMEROUS_ threads on fast sqrt() and fast reciprocal sqrt(). What I am looking for is a way to scale the 3 inputs before calling <any fast sqrt method here>, and then to un-scale afterwards to obtain the accurate result.
So I started looking at how to implement a 2d hypot() function:
Define hypot(x, y) as sqrt(x*x + y*y) but that avoids intermediate overflow and underflow:
Using the fact that sqrt(x^2+y^2) == k*sqrt((x/k)^2+(y/k)^2) for k>0
you can implement a fast hypot() by choosing k to be a power of 2 close to sqrt(x*y). Since you choose k to be a power of 2, computing k, x/k, and y/k can be done using bit twiddling and knowledge of IEEE floating point representation. I can go into that further, but my problem is how to generalize this for 3 components x,y,z without simply nesting the call as hypot(hypot(x,y),z). Any suggestions?
I know that most runtime libraries (C,C++,Java) provide a 2d hypot() function that does this for 2 variables, _AND_ I realize that I can achieve my result using 2 calls to it as follows:
hypot( hypot(x, y), z)
but that seems uneccessarily slow. Is there a faster way that still avoids intermediate overflow and underflow? I am _NOT_ looking for a fast sqrt() function here. I am well aware of the _NUMEROUS_ threads on fast sqrt() and fast reciprocal sqrt(). What I am looking for is a way to scale the 3 inputs before calling <any fast sqrt method here>, and then to un-scale afterwards to obtain the accurate result.
So I started looking at how to implement a 2d hypot() function:
Define hypot(x, y) as sqrt(x*x + y*y) but that avoids intermediate overflow and underflow:
Using the fact that sqrt(x^2+y^2) == k*sqrt((x/k)^2+(y/k)^2) for k>0
you can implement a fast hypot() by choosing k to be a power of 2 close to sqrt(x*y). Since you choose k to be a power of 2, computing k, x/k, and y/k can be done using bit twiddling and knowledge of IEEE floating point representation. I can go into that further, but my problem is how to generalize this for 3 components x,y,z without simply nesting the call as hypot(hypot(x,y),z). Any suggestions?