Allegro.cc Forums » Programming Questions » Performance of function pointers, ifs and switch statements.

 Performance of function pointers, ifs and switch statements.
 Gnamra Member #12,503 January 2011 I wanted to compare the three mentioned in the title, and this is the result I got:Pointers: 4.09492sIfs: 4.12192sSwitch: 4.0839sI find it a bit odd that they're this close in time. In the for loop with the function pointer it doesn't have to do any checks, it just does the calculation to get the index and then calls the function, right?The reason I'm curious is because I'm toying with the idea of making a small game using hopefully no if statements. It may be a stupid idea, but I got it when thinking about the fact that in the real world there aren't any checks, it's all cause and effect and I want to try to make a program that works in the same way.I'd very much like it if anyone could provide some information on why they're so close in terms of performance, and why the switch is faster than using function pointers?Here's the code which I used to time them. #SelectExpand 1#include 2#include 3#include 4#include 5#include 6#define LOOPAMOUNT 10000000000 7 8int one(); 9int two(); 10int three(); 11int four(); 12int five(); 13int six(); 14int seven(); 15int eight(); 16int nine(); 17int ten(); 18 19int (*p[10]) (void); 20 21int main(void) 22{ 23 p[0] = one; 24 p[1] = two; 25 p[2] = three; 26 p[3] = four; 27 p[4] = five; 28 p[5] = six; 29 p[6] = seven; 30 p[7] = eight; 31 p[8] = nine; 32 p[9] = ten; 33 34 std::chrono::time_point start, start2, start3, end, end2, end3; 35 36 // Using function pointers 37 start = std::chrono::high_resolution_clock::now(); 38 for(unsigned long long i = 0; i < LOOPAMOUNT; i++) 39 { 40 (*p[i % 10]) (); 41 } 42 end = std::chrono::high_resolution_clock::now(); 43 44 // Using if statements 45 start2 = std::chrono::high_resolution_clock::now(); 46 for(unsigned long long i = 0; i < LOOPAMOUNT; i++) 47 { 48 if(i % 10 == 0) one(); 49 else if(i % 10 == 1) two(); 50 else if(i % 10 == 2) three(); 51 else if(i % 10 == 3) four(); 52 else if(i % 10 == 4) five(); 53 else if(i % 10 == 5) six(); 54 else if(i % 10 == 6) seven(); 55 else if(i % 10 == 7) eight(); 56 else if(i % 10 == 8) nine(); 57 else ten(); 58 } 59 end2 = std::chrono::high_resolution_clock::now(); 60 61 // Using switch 62 start3 = std::chrono::high_resolution_clock::now(); 63 for(unsigned long long i = 0; i < LOOPAMOUNT; i++) 64 { 65 switch(i % 10) 66 { 67 case 0: one(); break; 68 case 1: two(); break; 69 case 2: three(); break; 70 case 3: four(); break; 71 case 4: five(); break; 72 case 5: six(); break; 73 case 6: seven(); break; 74 case 7: eight(); break; 75 case 8: nine(); break; 76 case 9: ten(); break; 77 } 78 } 79 80 end3 = std::chrono::high_resolution_clock::now(); 81 82 std::chrono::duration elapsed_seconds = end - start; 83 std::chrono::duration elapsed_seconds2 = end2 - start2; 84 std::chrono::duration elapsed_seconds3 = end3 - start3; 85 86 std::cout << "Pointers: " << elapsed_seconds.count() << "s\n"; 87 std::cout << "Ifs: " << elapsed_seconds2.count() << "s\n"; 88 std::cout << "Switch: " << elapsed_seconds3.count() << "s\n"; 89 90 system("pause"); 91 return 0; 92} 93 94int one() 95{ 96 return 1; 97} 98 99int two() 100{ 101 return 2; 102} 103 104int three() 105{ 106 return 3; 107} 108 109int four() 110{ 111 return 4; 112} 113 114int five() 115{ 116 return 5; 117} 118 119int six() 120{ 121 return 6; 122} 123 124int seven() 125{ 126 return 7; 127} 128 129int eight() 130{ 131 return 8; 132} 133 134int nine() 135{ 136 return 9; 137} 138 139int ten() 140{ 141 return 10; 142}
 #00JMP00 Member #14,740 November 2012 Looking at the code, I find it strange why you would use modulo. You could simplify things by using for(0...10), couldn't you?But maybe this is only the prototype for something bigger, which will have to use it. At second glance, why don't you need & to asign function pointers to the functions??p[n] = &one...and so forth.Last issue:Why do you set up functions 2 to n, when you only test function 0 and 1.
 Gnamra Member #12,503 January 2011 I'm using modulo to get a number from 0 to 9I could use 0 to 9, but by looping through so many times the differences add up and it becomes more clear how each one performs compared to the others.p[n] = one is just the short form of doing it.p[n] = &one is the "correct" way of doing it.Not using & depends on the compiler, but you should use & if you want to write portable code. I didn't care about it since this was just a small program to find out how the different approaches to call a bunch of functions compared to each other.But I removed the short array of size 10, because it was redundant. I used it for something before just changed it without realizing how redundant it was.What do you mean by set up function 2 to n? When I checked, all the different functions are being called.
 #00JMP00 Member #14,740 November 2012 Gnamra said: I'm using modulo to get a number from 0 to 9You are right, I am wrong. I thought you would use num[i]%10 to call the functions, but you use a different value. Therefore everything is fine :-)Actually all numbers 0 to 9 %10 will turn out to be 0, not even 1. Therefore the test is maybe not correct. You should assign at least numbers to 100 to num[i] to get a result. #SelectExpand 1 2 for(int i = 0; i < 10 0; i++) // change this, to get better result 3 4 num[i] = i;
 Gnamra Member #12,503 January 2011 No, 0 - 9 % 10 will give 0-9. When it hits 10, then 10 % 10 will give 0.Because of this i % 10 will give 0-9 over and over again.Also I removed the num[i], the array was redundant, I could just do i % 10 instead. I'll edit the code.
 Chris Katko Member #1,881 January 2002 If you want to make a game without "if"s then, you're going to quickly be overwhelmed, but feel free to try it. Remember, a game isn't the real world. It's a simulation... an approximation. And every time you approximate something, imagine you're putting a nice piecewise function together with an if statement.That being said: 1 - Look at the assembly, and for the love of God turn off optimizations. For all you know all of those turn into the same code. 2 - If/switch/etc are all the same anyway. It's a thousand times worse to incur a context switch than it is to use any branching. So what you're getting is a CPU with a gigantic "velocity" but a tiny "acceleration" and every time you decide to switch contexts it has to stop and start the process over again. So there are things entirely out of your control (like task scheduling) that will have more effect on your program than your decision for using switches/ifs/etc in most cases. 3 - Never assume a benchmark reflects real-world performance. -----sig:“Programs should be written for people to read, and only incidentally for machines to execute.” - Structure and Interpretation of Computer Programs"Political Correctness is fascism disguised as manners" --George Carlin
 torhu Member #2,727 September 2002 Benchmarking without optimizations isn't that useful, I'd say compile with -O2, but move the function definitions into a separate file to make sure they are not inlined. Add up all the return values and print the result in the end to make sure the calls are not removed by the optimizer.It's also a good idea to change the order of the items you are benchmarking to make sure the order does not affect performance. Putting each test into its own function makes this easier.The reason switches can be faster than ifs is that they are often optimized into jump tables, meaning there are no comparisons, only one table lookup and one or two unconditional jumps. But this could in principle happen to a series of ifs too, looking at the assembly is the best way to tell.By the way, you need some way to select which functions to call anyway, so you might end up reinventing switch statements somehow
 bamccaig Member #7,536 July 2006 I can't figure out what it is that you're trying to benchmark here. None of these code constructs should be the bottleneck in your code, but even if they were, I don't think that your tests would accurately measure their performance. It's also the kind of thing that doesn't have a single right answer. What might be right on your hardware for a simple test might be wrong on your hardware with a real world situation, or even on my hardware with the same simple test. Don't waste your time with this kind of thing unless you have a lot of it and plan to focus your career on low-level optimization (generally not very worthwhile in 99% of applications, but there are some unique systems where it is vital and you could find employment working with those if you learn your stuff...). For the most part, I can't help thinking "why would you even?" -- I mean the best with what I say. It doesn't always sound that way.
 torhu Member #2,727 September 2002 Don't listen to bamccaig, he has forgotton what curiosity is
 bamccaig Member #7,536 July 2006 Benchmarking without a real world example is useless. 99% of the time if you make a guess about what is wrong with your program that is causing it to be so slow you will be wrong (at least until you have 10 or 20 years of experience under your belt and have already been wrong enough to know). Generally the only reliable way to speed a program up these days is to run it through a profiler and let the profiler tell you where all of the time is spent. Computers have gotten exponentially more complicated than they were 20 years ago. There may have been a time when this question was a valid one to ask, but these days it isn't really... It's a waste of the OP's time, IMO, and it's irresponsible to encourage it instead of pointing him in a more appropriate direction... -- I mean the best with what I say. It doesn't always sound that way.
 Audric Member #907 January 2001 torhu said:move the function definitions into a separate file to make sure they are not inlined No, some optimizations can be performed at link time, so these functions can still be inline. It's not so easy to outsmart a compiler !I'm with bamccaig on this one : It's totally naive to try optimize if you don't have a general idea of what matters.Curiosity is fine, but before you try measure and draw conclusions, you should at least read about :- compiler optimization techniques (especially "common subexpression elimination" and "branch prediction")- some architecture points, like "instruction pipeline"These will remove a lot of misconceptions about what actually happens at compile and run time, and it will hopefully make you realize that the results of your test tell you absolutely nothing of what would happen in a real (useful) program.
 torhu Member #2,727 September 2002 Audric said: No, some optimizations can be performed at link time, so these functions can still be inline. It's not so easy to outsmart a compiler !Sure, but enabling link-time optimization when you want to avoid inlining functions defined in other files would a bit be silly, don't you think?
 Audric Member #907 January 2001 Ah sorry, I took this statement out of context. You were advising -O2 so you're talking explicitely of GCC, and indeed, it will not perform LTO unless you request it explicitely using -flto
 Chris Katko Member #1,881 January 2002 I'm all for curiosity, however, optimization should be the last step, not the first one. A program that works slowly gets shipped. A program that goes fast and doesn't do anything doesn't... and in five years a slow program runs fast on new computers. (e.g. Adobe Photoshop) -----sig:“Programs should be written for people to read, and only incidentally for machines to execute.” - Structure and Interpretation of Computer Programs"Political Correctness is fascism disguised as manners" --George Carlin