|
Performance Tuning |
Onewing
Member #6,152
August 2005
|
I was reading a book (in the appendix) that lists several methods of performance tuning. Most of them only increase framerate by a little bit (but more so on older computers). It was somewhat interesting, so I thought I'd query the infinite knowledge of a.cc to see if there are more tricks that you use. ComputerScience++; Here are some of the methods mentioned (I'll probably have to go back to the book later to make sure I list them correctly) in the book (summarized):
There were others but I can't remember right now. ------------ |
gnolam
Member #2,030
March 2002
|
Use the right algorithms for the task. It's going to matter a hell of a lot more than the micro-optimizations above. -- |
Tobias Dammers
Member #2,604
August 2002
|
Quote:
Yes, but readability makes for more maintainable code. Avoid pointer indirections only if you have found the particular piece of code to be a bottleneck; otherwise I'd prefer readability over a marginal or unnoticeable speed improvement. Quote:
This doesn't always hold true. Data sizes that are not multiples of the target architecture's word size (32 or 64 bits on anything >= 386) can cause misaligned data and make things slower instead of faster. Why else do you think most modern graphics cards use 32 bpp instead of 24? Quote:
Use memory pools if you have to. Here's some more:
--- |
Joel Pettersson
Member #4,187
January 2004
|
Quote:
Divide is indeed The expensive math operation, often more expensive by an order of magnitude than the others. Quote:
Try doing so for large chunks of data, as it will allow you to squeeze more into the cache. Otherwise, it can be counter-productive, as noted by Tobias. Quote:
If you imply splitting functions just to keep them small, there is no reason to, unless you can use the resulting split-off functions several times and the code size is significantly reduced. Larger chunks of continuous code can be optimized better. (when inlining the calls, it often doesn't matter, though. unless you access parameters by pointer or reference; this can make inline functions noticeably worse than an equivalent macro in some cases) As for casting explicitly, doing so can sometimes result in the addition of additional, unneccessary instructions with GCC. It has happened to me when writing double calculation results to a float buffer. Quote:
Use of inline assembly or proper library functions (not always avaliable; may require certain defines. often compiler-specific) can reduce conversions to a few cycles, if you don't mind rounding occuring, or are fine with setting the mode to truncation. Quote:
Branch predition is sophisticated, so if the branch can be predicted well, this can be counter-productive. (for the cases where the conditional statement is repeatedly encountered) Make sure to test it when performance is critical.
|
HoHo
Member #4,534
April 2004
|
Avoid linked lists if possible and your overall design allows Most other things have already been said. Start by thinking correct design that uses proper algorithms. Avoiding to do stuff is better than doing the stuff faster. Before starting to make things faster profile to see if you are working on the right piece of code. Callgrind can be quite nice tool when combined with KCachegrind UI. Unfortunately I think it only works under Linux and possibly BSD and OSX. Here is a nice screenshot of one of my old program profiling output: If you really need to get close to metal you might want to learn using special libraries that allow reading performance monitoring counters. [url __________ |
Onewing
Member #6,152
August 2005
|
I've been using some of these tricks to work on the performance of my Toggles game. The main menu ran at about 28-30 fps (on my computer) and I've gotten it up to 40+. The main game runs at 57-60fps, but if I take out the "water" portion of the game, it runs at 90-100fps. Obviously I need to work on that algorithm or try a new approach all together. I'm thinking of a method with palettes might work. And without adding a library like fblend, some of the transparency/translucency methods are bogging it down too. I'd prefer not to add another library if all possible... ------------ |
Vasco Freitas
Member #6,904
February 2006
|
So... "x *= 0.5;" is faster than "x /= 2.0;"? |
HoHo
Member #4,534
April 2004
|
Quote: So... "x *= 0.5;" is faster than "x /= 2.0;"? Yes, it will be faster, though most compilers will probably make that optimization for you. __________ |
Arthur Kalliokoski
Second in Command
February 2005
|
[EDIT] Deleted stupid half baked thought. They all watch too much MSNBC... they get ideas. |
ImLeftFooted
Member #3,935
October 2003
|
The OP are good tips for when you write your code the first time. Except the avoiding new and delete, that one doesn't make any sense. When you go back, the best tips are the obvious ones. Quote: Yes, it will be faster, though most compilers will probably make that optimization for you. Unless 2.0 is in a variable, which it usually is. |
zenofeller_
Member #8,364
February 2007
|
i think what was meant was more along the lines of, avoid putting new and delete in for loops. which does make sense. and yes, forget / as a math symbol. / is a mistyped slash and that's all. you got \ for strings and * for numbers and that's all you'll ever need. I, however, am strongly suggesting that you START CAPITALIZING, FUCK IT! (gnolam) |
Onewing
Member #6,152
August 2005
|
Quote: Except the avoiding new and delete, that one doesn't make any sense. What I meant is keeping new/delete to a minimum and doing them at the right/logical time. You wouldn't want to create the monsters every time you went into a random battle on an RPG. You'd want to create them into a data structure at the beginning and then just give the battle engine pointers to access their data, no load time needed (at least, that's what I'm doing in my CH title). ------------ |
ImLeftFooted
Member #3,935
October 2003
|
But new and delete are tools, great tools. Saying 'use them less' makes no sense. What would make more sense is saying "use a smart allocation scheme". But use new and delete all you want, no sense holding out. And they don't turn evil once their inside a loop, that just makes no sense at all. |
Onewing
Member #6,152
August 2005
|
Quote: What would make more sense is saying "use a smart allocation scheme". I'm having a hard time putting my words together effectively today. I wonder if I've had a stroke. Checks memory Anyway, that's what I mean. It's the same principle as to why you create a trig. lookup table before the main processing. It makes things faster to lookup the value in a table rather than performing the trig function as needed. ------------ |
ImLeftFooted
Member #3,935
October 2003
|
Quote: I'm having a hard time putting my words together effectively today. I wonder if I've had a stroke. Checks memory Anyway, that's what I mean. So then we agree:) Quote: It's the same principle as to why you create a trig. lookup table before the main processing. It makes things faster to lookup the value in a table rather than performing the trig function as needed. Depends on the processor, most modern processors implement sin and cos as single instructions. But I think I understand your point. |
anonymous
Member #8025
November 2006
|
Quote:
Alas, you don't have the choice very often, do you? Quote:
I did some testing with a class function (it took a random int and returned the n+n) and called this an immense number of times in three ways: a normal stack object (2633), dynamically allocated object (2644), another version of the class using virtual function (3044). (Sample run, times were rather consistent.) While the difference between the first two is marginal, virtual function calls are slower. |
|