|
|
| Which one is faster |
|
Shravan
Member #10,724
February 2009
|
for(i=0;i<100;i++) { for(j=0;j<10;j++) { //Some piece of code } } or for(i=0;i<10;i++) { for(j=0;j<100;j++) { //Same code as above } }
|
|
SiegeLord
Member #7,827
October 2006
|
Any modern compiler will rewrite one as the other as needed. Don't worry about it. "For in much wisdom is much grief: and he that increases knowledge increases sorrow."-Ecclesiastes 1:18 |
|
OICW
Member #4,069
November 2003
|
Both run at O(M*N) so it doesn't matter. And in case of some optimisations, it doesn't really matter that much, compiler will do what he deems necessary and most of the time doing some optimisations manually is contraproductive. [My website][CppReference][Pixelate][Allegators worldwide][Who's online] |
|
anonymous
Member #8025
November 2006
|
If "some piece of code" is accessing elements in a 2d array, then it might be wiser to loop over it so as to access elements sequentially. Otherwise it might be clearer just to loop 1000 times. |
|
Goalie Ca
Member #2,579
July 2002
|
What matter is how you are touching the memory. Performance nowadays pretty much boils down to cache usage. If you can't make good use of the l2 cache then you're toast. ------------- |
|
Oscar Giner
Member #2,207
April 2002
|
Assuming that the order at wich you do the iterations doesn't matter, it's faster to do it reverse: for(i=99;i>=0;--i) { for(j=9;j>=0;--j) { //Some piece of code } }
CPU's only have comparison instructions against 0. To compare against a non zero value you need to add an extra substraction instruction and compare against 0 (i<100 is transformed to i-100<0). Of course the performance gain will be negligible in 99.99% of cases and you lose readability in your code -- |
|
type568
Member #8,381
March 2007
|
Oscar Giner said: CPU's only have comparison instructions against 0. To compare against a non zero value you need to add an extra substraction instruction and compare against 0 (i<100 is transformed to i-100<0). Of course the performance gain will be negligible in 99.99% of cases and you lose readability in your code Point me at a modern CPU failing to complete a comparison of two non zero integers in a single cycle. Append:
|
|
Goalie Ca
Member #2,579
July 2002
|
type568 said: Point me at a modern CPU failing to complete a comparison of two non zero integers in a single cycle.
L1 cache (1 or 2 cycles) IIRC, modern chips use memory locality and work best at prefetching when traversing from front to back. ------------- |
|
type568
Member #8,381
March 2007
|
Goalie Ca said:
L1 cache (1 or 2 cycles) Data retrieving isn't comparison. Also it's obvious, that my statement was regarding the difference of comparison vs zero or vs other value. In both cases, they gotta be retrieved(since CPU can't know there's the zero, I bet it also has to be retrieved..). And when all the data is finally in registers, it'll be single cycle for both of'em.
|
|
Tobias Dammers
Member #2,604
August 2002
|
The most important rule about code optimization: Don't do it. Anyway: As Goalie pointed out, the only thing that matters in this context is cache performance, and this means that you should try to iterate over arrays as sequentially as possible. If you have a 2D array to iterate over, then the inner loop should iterate over the last dimension: for (int y = 0; y < h; ++y) for (int x = 0; x < w; ++x) do_something(array[y][x]); This way, the array is accessed sequentially, and the cache is used optimally. If you access the array the other way around, the code jumps back and forth in memory, and depending on the size of the array, each jump may result in a cache miss. --- |
|
Audric
Member #907
January 2001
|
A honest question: According to some googling, I can expect page size of 4096 on Windows. If all the data I access happens to be in the same page (say, in the 2kb in the middle of a page), does it still matter ? |
|
type568
Member #8,381
March 2007
|
Tobias Dammers said: This way, the array is accessed sequentially, and the cache is used optimally. If you access the array the other way around, the code jumps back and forth in memory, and depending on the size of the array, each jump may result in a cache miss. I think it sounds more evil than it is, but yeah. That's the only thing that came in to my mind too, when thinking of performance. Though it has nothing to do with the original question. Audric said: A honest question: According to some googling, I can expect page size of 4096 on Windows. If all the data I access happens to be in the same page (say, in the 2kb in the middle of a page), does it still matter ? If you have 2kb, you shouldn't really care about it, unless it's many many times of 2kb. Or I might be literally wrong. If you could give more specific example, I think you could get a more detailed answer.
|
|
Audric
Member #907
January 2001
|
A 2-level nested loop is very often used to scan a 2D array, so I think that it really matters (what's inside the nested loop, and how your data is organized) |
|
type568
Member #8,381
March 2007
|
Audric said: A 2-level nested loop is very often used to scan a 2D array, so I think that it really matters (what's inside the nested loop, and how your data is organized) Yeah, it's what Tobias said.. Audric said: A honest question: According to some googling, I can expect page size of 4096 on Windows. If all the data I access happens to be in the same page (say, in the 2kb in the middle of a page), does it still matter ?
Perhaps I misunderstood the question.
|
|
Evert
Member #794
November 2000
|
type568 said: I think it sounds more evil than it is
Cache misses are an absolute killer. You can easily loose a factor 2 or more in speed. |
|
Audric
Member #907
January 2001
|
Ok so I guess my 16Mb lookup array for RGB->8bit color reduction was a bad idea |
|
Tobias Dammers
Member #2,604
August 2002
|
Audric said: Ok so I guess my 16Mb lookup array for RGB->8bit color reduction was a bad idea Yes, because it doesn't take the alpha channel into account. And we're not even talking about blending - a lookup table for truecolor alpha blending, now that would be lightning fast! --- |
|
type568
Member #8,381
March 2007
|
Tobias Dammers said: Yes, because it doesn't take the alpha channel into account. And we're not even talking about blending - a lookup table for truecolor alpha blending, now that would be lightning fast!
Guess I've lost my age to learn Chinese..
|
|
Audric
Member #907
January 2001
|
type568: There are 256x256x256 colors possible in RGB, I use an array of this size (224 bytes) to look up a 0-255 color. And I access this array for each pixel of a photo, so many times... |
|
Tobias Dammers
Member #2,604
August 2002
|
Audric said: 264 bytes
Which happens to be the entire address space on a 64 bit system. Even if you could have that much memory, you wouldn't be able to fit in another 8 bytes for the actual pixels you're trying to blend. --- |
|
Audric
Member #907
January 2001
|
Well, I started with my array (which is actual running code) |
|
type568
Member #8,381
March 2007
|
Audric said: There are 256x256x256 colors possible in RGB, I use an array of this size (224 bytes) to look up a 0-255 color. Not very cheap, but affordable.. 16MiB. The question is if it's any faster.. Some extra CPU cycles to save RAM access might not be as expensive.
|
|
|