Should I Use "sqrt(n)" or "pow(n,0.5f)"?

Kris Asick

Occasionally, I notice things I can't believe I hadn't before. One of them just now is that taking a number to the power of 0.5 results in the same answer as taking the square root of that number.

So my question is pretty simple: Which is the fastest for the CPU?

sqrt(n);

pow(n,0.5f);

Evert

Try both and use a profiler to find out.

My bet: sqrt(x) is faster than pow(x, 0.5) because there is a dedicated function for it (even if there wasn't one, sqrt(x) would be aliased to pow(x, 0.5) and therefore run at the same speed).

gnolam

Use sqrt() for code clarity if nothing else. If you mean a square root, write it like a regular square root - just like you'd usually write $<math>\sqrt{x}</math>$ instead of $<math>x^{\frac{1}{2}}</math>$ .

bamccaig

I miss math class...

Arthur Kalliokoski

x86 has its own intrinsic sqrt function, which (at least at one time) was supposed to be faster than a division.

Tobias Dammers

gnolam said:

Use sqrt() for code clarity if nothing else.

Seconded.
Unless this calculation needs to be done in a very time-critical part of your code, in which case you should first optimize your algorithms to avoid expensive operations where possible, then carefully profile to find out which performs better.
It may even differ between CPUs, who knows.

Evert

gnolam said:

just like you'd usually write \sqrt{x} instead of x^{\frac{1}{2}}

Funny, I tend to write the fractional exponent most of the time. It looks better if you have a very big expression under the root. I also deal with different fractional exponents and $<math>x^{7/2}</math>$ looks cleaner than $<math>\sqrt{x^7}</math>$ in most cases. Let alone $<math>x^{4/3}</math>$ instead of $<math>\sqrt[3]{x^4}</math>$ .

EDIT: but I always set it as $<math>x^{1/2}</math>$ , never $<math>x^\frac{1}{2}</math>$ .

Jonatan Hedborg

sqrt is probably a LOT faster than pow(n, 0.5f). Specific algorithms versus generic ones... I would not be surprised if sqrt beats pow by an order of magnitude or more. Do some testing on it.

Also, it's clearer.

gnolam

Evert said:

I also deal with different fractional exponents and $<math>x^{7/2}</math>$ looks cleaner than $<math>\sqrt{x^7}</math>$ in most cases. Let alone $<math>x^{4/3}</math>$ instead of $<math>\sqrt[3]{x^4}</math>$ .

That's why I added the "usually".
If 1/2 is a special case among other fractional exponents, I'd of course write it like the others. Likewise if there's multiple exponentiation involved (e.g. $<math>(x^{1/2})^3</math>$ ). But if it's a "natural" square root, the radical makes it instantly obvious what you're dealing with.

Karadoc ~~

I almost exclusively write square root as an exponent when I work with them on paper - mostly because it looks neater. For example, I find it difficult to get a neat looking square root sign on the bottom of a faction, or a neat square root over a long expression.

Never the less, I'd still use sqrt(x) rather than pow(x, 0.5) when I'm programming.

StevenVI

I found this intriguing, so I went ahead and wrote my own test for it.

The code:

#include <stdlib.h>
#include <math.h>
#include <limits.h>


int p(int p) { pow(p, 0.5); }
int s(int s) { sqrt(s); }

int main(int argc, char *argv[]) {
        int i;
        for(i=0; i<1231231231; i++) {
                int r = rand();
                s(r);
                p(r);
        }
}

The result from gprof:

  %   cumulative   self              self     total
 time   seconds   seconds    calls  ms/call  ms/call  name
 49.9      60.16    60.16 1231231231     0.00     0.00  p [2]
 44.2     113.45    53.29 1231231231     0.00     0.00  s [3]
  5.3     119.81     6.35                             main [1]
  1.3     121.33     1.52                             _gmon_start_ [4]

The self-seconds column is the important one here. I would conclude that there is hardly any difference.

(I too use fractional exponents, by the way.)

Evert

That test is flawed: to do a meaningful test, you need to compile with optimisations switched on, but since the results are never used, the compiler can (and will) eliminate the calculations altogether.

Arthur Kalliokoski

It'd also help if the calculations actually did useful work, so that any concurrency could be taken advantage of.

torhu

Try something like this, compile with -O2 or -O3. You need one version for each function to test. Time it with the time command or something.

#include <stdio.h>
#include <stdlib.h>
#include <math.h>

int main(int argc, char *argv[]) {
        double acc = 0;
        int i;
        for(i=0; i<100000000; i++) {
                acc += sqrt(rand());
        }

        printf("%f\n", acc);
        return 0;
}

Oscar Giner

rand is not a fast function though. It may even be slower than sqrt so you're testing more rand than sqrt.

torhu

Could be, but the point is to see what happens if you replace sqrt(rand()) with pow(rand(), 0.5);

Oscar Giner

But that only allows you to know if one is faster than the other, but not by how much as Harry posted.

Evert

Just calculate sqrt(i) or the result of some simple function of i.

gillius

You also want to make sure the edge-case optimization is the same -- ideally you want the same input sets. It might make more sense to store the random numbers generated (to get around the 1 million array, you could make like a 100k array and loop it 10 times). That way you pass the same values to both.

I would agree with Evert's first reply, though. If pow cannot be faster than sqrt, because if there was a way of doing pow faster than sqrt, the library implementer would call pow from sqrt. Of course, I assume a world with all of the same hardware. It theoretically could be possible that an implementation of sqrt that is faster than pow on some hardware could be slower on another hardware; however, I would imagine that to be an unlikely scenario.

Kris Asick

Geeze... once again I've proven that all I have to do is say something and people will debate for hours! ::)

Since I only have to make the call 200 times a second, and because I wanted fine control over the shape of the curve, I decided to use pow() with a constant so that I can just change the constant if I feel the exponential curve is too shallow or too steep.

But, don't let that stop any of you from continuing to explore which is better for square roots.

(Note to Mods: "proven" is missing from the spell check dictionary.)

Arthur Kalliokoski

Your original post lead us to believe the needed square root was time critical.

Evert

Kris Asick said:

But, don't let that stop any of you from continuing to explore which is better for square roots.

That is a bit rich, coming from the guy who asked

Kris Asick said:

So my question is pretty simple: Which is the fastest for the CPU?

sqrt(n);

pow(n,0.5f);

Speedo

It may be implementation dependent, but at least on MSVC9 sqrt() appears to be significantly faster than pow(). I would generally advocate using sqrt() regardless just for readability.

Test:

#SelectExpand
  1#include <cmath>
  2#include <ctime>
  3#include <iostream>
  4
  5int main( )
  6{
  7  const int loops = 10000000;
  8  float f;
  9
 10  std::time_t start = std::clock( );
 11
 12  for (int i = 0; i < loops; ++i)
 13    f = std::sqrt(static_cast<float>(i));
 14
 15  std::cout << std::clock( ) - start << std::endl;
 16  std::cout << f << std::endl;
 17
 18  start = std::clock( );
 19
 20  for (int i = 0; i < loops; ++i)
 21    f = std::pow(static_cast<float>(i), 0.5f);
 22
 23  std::cout << std::clock( ) - start << std::endl;
 24  std::cout << f << std::endl;
 25
 26  return 0;
 27}

Average time across 10 tests:
sqrt: 249.6
pow: 967.3

Edit: Tests are with optimization.

Kris Asick

Evert said:

That is a bit rich, coming from the guy who asked

Maybe, but I was mostly just curious because it's not something I ever put any thought into and was wondering if someone else already knew.

I think it's easy to forget with all the other problems which come up on this forum that not everyone has critical issues. If CPU time was a critical factor I would've tested this myself and never asked.

I'm still kinda surprised how many people responded to a question I thought was really simple and unimportant.

BlackShark

Give this Square root function a go,

float SquareRoot(float number) {
    long i;
    float x, y;
    const float f = 1.5F;

    x = number * 0.5F;
    y  = number;
    i  = * ( long * ) &y;
    i  = 0x5f3759df - ( i >> 1 );  //Black magic mystery number
    y  = * ( float * ) &i;
    y  = y * ( f - ( x * y * y ) );
    y  = y * ( f - ( x * y * y ) );
    return number * y;
}

Tobias Dammers

BlackShark said:

Give this Square root function a go,

Have you tried that on a 64bit platform? Or a PowerPC?

Thomas Harte

Kris Asick said:

I'm still kinda surprised how many people responded to a question I thought was really simple and unimportant.

And not one of them mentioned powf or sqrtf, which I assume may well outperform pow an sqrt with no negative side effects if you're dealing with floats anyway. I think they're both C99, so possibly available on MSVC.

William Labbett

Just to add a pointless observation :

you've got other options.

int root_n(int n)
{
int root = n;

while(pow(root--, 2) != n)
;

return n;
}

EDIT : oh, if someone would just say "We know you can code in C to some extent.", I'd be able to stop trying to prove myself.

Arthur Kalliokoski

Doesn't decrementing a variable within parameters to a function leave the original unchanged?

William Labbett

Well spotted.

I had a look through my C book and since there's no example of doing this, I surmise you must be right.

Arthur Kalliokoski

There's always the possibility of making a quick test program

#include <stdio.h>

int main(void)
{

    int i = 5;
    printf("\ni is originally %d",i);
    printf("\ni decremented = %d",--i);
    printf("\nand i is now %d",i);
    printf("\npress ENTER to exit");
    getchar();
    return 0;
}

and it seems to affect i permanently?!!?

William Labbett

598599

and I did use your code! so it looks like the assumption I made was erroneous.

Arthur Kalliokoski

Uh, what? My assumption was that it would print "and i is now 5", which it did not, so your assumption was correct.

Neil Black

Mr. Labbett, I see that you can code in C to some extent. Good job! You capabilities are known amongst your peers, and you have become respected to some extent.

William Labbett

Arthur :

I meant this assumption I made :-

Will said:

I had a look through my C book and since there's no example of doing this, I surmise you must be right

..or was it a surmition ?

Neil :

thanks, I feel I can move on now.

Tobias Dammers

Arthur Kalliokoski said:

Doesn't decrementing a variable within parameters to a function leave the original unchanged?

No.
Both -- operators decrease the variable they act upon in-place, and return its value. The prefix -- operator returns the value after the decrement, the postfix -- operator returns the value before the decrement.
However, the code example:

int root_n(int n)
{
int root = n;

while(pow(root--, 2) != n)
;

return n;
}

...never changes the value of n, so it always returns the input unchanged, so it only "works" for n == 1 and n == 0. Also, for negative n, it will loop indefinitely, because the pow() call always returns a non-negative number, so the while condition is always true regardless of the value of root.

Neil Black said:

Mr. Labbett, I see that you can code in C to some extent.

It is valid C, I give you that.

William Labbett

return root;

of course. ..and some error checking wouldn't hurt. Apologies for you have to point out my mistake Tobias.

BlackShark

Tobias Dammers said:

Have you tried that on a 64bit platform? Or a PowerPC?

no i have not, why?

gillius

Thomas Harte said:

And not one of them mentioned powf or sqrtf, which I assume may well outperform pow an sqrt with no negative side effects if you're dealing with floats anyway.

My understanding is that, at least for x87 (or is it just x86 now) architecture, all of the floating point math is done internally on 80-bit numbers, making the 32-bit and their equivalent 64-bit floating point operations essentially equal in time.

I would imagine the same is true of 8,16,32-bit integer math on a 32bit CPU, and additionally 64 bit math on a 64 bit CPU (or more if using MMX/SSE/3dnow type instructions).

I don't have any hard experimental evidence of this, just years of hearing about the architecture makes me conclude this. Maybe someone knows otherwise.

Arthur Kalliokoski

gillius said:

all of the floating point math is done internally on 80-bit numbers, making the 32-bit and their equivalent 64-bit floating point operations essentially equal in time

The precision can be set to how many bits to calculate, affecting some operations. Also, if the size of double vs. float can affect whether or not an array of floating point type numbers overflows the cache can make a difference.

gillius

I was thinking about the memory bandwidth in the back of my mind, and whether or not the memory access was fast enough to make the cache irrelevant, since sqrt and pow are pretty expensive operations.

I know starting with the Pentium 4, the cache misses started to get to be a pretty big deal because of the CPU being so much faster than the RAM. I don't know if that gap is widening or narrowing these days.

Tobias Dammers

BlackShark said:

no i have not, why?

Because you use some nasty conversion between integers and floats there, one of the rare cases where the actual memory representation of floats and ints, which is platform dependent, may break your code.
On a 64 bit platform, long may be 64 bits instead of the expected 32, and dereferencing a long* that has been cast from a float* may produce an access violation (you're accessing 64 bits through a pointer that only has 32 bits allocated). You just can't assume that both long and float are 32 bits wide; although they are on virtually all common 32 bit platforms, they don't have to be.
On a PowerPC, endianness is different, so your bitwise logic may not work as expected (unless the byte ordering for floats is also reversed).

gillius said:

I don't know if that gap is widening or narrowing these days.

I don't know either, but I imagine two or more CPU cores competing for memory access doesn't make things easier for the RAM.

Thread #600351. Printed from Allegro.cc