Allegro.cc - Online Community

Allegro.cc Forums » Programming Questions » Which one is faster

This thread is locked; no one can reply to it. rss feed Print
Which one is faster
Shravan
Member #10,724
February 2009
avatar

for(i=0;i<100;i++)
{
   for(j=0;j<10;j++)
  {
    //Some piece of code
  }
}

or

for(i=0;i<10;i++)
{
   for(j=0;j<100;j++)
  {
    //Same code as above
  }
}

SiegeLord
Member #7,827
October 2006
avatar

Any modern compiler will rewrite one as the other as needed.

Don't worry about it.

"For in much wisdom is much grief: and he that increases knowledge increases sorrow."-Ecclesiastes 1:18
[SiegeLord's Abode][Codes]:[DAllegro5]:[RustAllegro]

OICW
Member #4,069
November 2003
avatar

Both run at O(M*N) so it doesn't matter. And in case of some optimisations, it doesn't really matter that much, compiler will do what he deems necessary and most of the time doing some optimisations manually is contraproductive.

[My website][CppReference][Pixelate][Allegators worldwide][Who's online]
"Final Fantasy XIV, I feel that anything I could say will be repeating myself, so I'm just gonna express my feelings with a strangled noise from the back of my throat. Graaarghhhh..." - Yahtzee
"Uhm... this is a.cc. Did you honestly think this thread WOULDN'T be derailed and ruined?" - BAF
"You can discuss it, you can dislike it, you can disagree with it, but that's all what you can do with it"

anonymous
Member #8025
November 2006

If "some piece of code" is accessing elements in a 2d array, then it might be wiser to loop over it so as to access elements sequentially. Otherwise it might be clearer just to loop 1000 times.

Goalie Ca
Member #2,579
July 2002
avatar

What matter is how you are touching the memory. Performance nowadays pretty much boils down to cache usage. If you can't make good use of the l2 cache then you're toast.

-------------
Bah weep granah weep nini bong!

Oscar Giner
Member #2,207
April 2002
avatar

Assuming that the order at wich you do the iterations doesn't matter, it's faster to do it reverse:

for(i=99;i>=0;--i)
{
   for(j=9;j>=0;--j)
  {
    //Some piece of code
  }
}

CPU's only have comparison instructions against 0. To compare against a non zero value you need to add an extra substraction instruction and compare against 0 (i<100 is transformed to i-100<0). Of course the performance gain will be negligible in 99.99% of cases and you lose readability in your code :P

type568
Member #8,381
March 2007
avatar

CPU's only have comparison instructions against 0. To compare against a non zero value you need to add an extra substraction instruction and compare against 0 (i<100 is transformed to i-100<0). Of course the performance gain will be negligible in 99.99% of cases and you lose readability in your code

Point me at a modern CPU failing to complete a comparison of two non zero integers in a single cycle.

Append: :D

Goalie Ca
Member #2,579
July 2002
avatar

type568 said:

Point me at a modern CPU failing to complete a comparison of two non zero integers in a single cycle.

L1 cache (1 or 2 cycles)
L2 cache (10s cycles)
Ram (100s cycles)
Branch misprediction (10s of cycles)

IIRC, modern chips use memory locality and work best at prefetching when traversing from front to back.

-------------
Bah weep granah weep nini bong!

type568
Member #8,381
March 2007
avatar

Goalie Ca said:

L1 cache (1 or 2 cycles)
L2 cache (10s cycles)
Ram (100s cycles)
Branch misprediction (10s of cycles)

Data retrieving isn't comparison. Also it's obvious, that my statement was regarding the difference of comparison vs zero or vs other value. In both cases, they gotta be retrieved(since CPU can't know there's the zero, I bet it also has to be retrieved..).

And when all the data is finally in registers, it'll be single cycle for both of'em.

Tobias Dammers
Member #2,604
August 2002
avatar

The most important rule about code optimization: Don't do it.

Anyway: As Goalie pointed out, the only thing that matters in this context is cache performance, and this means that you should try to iterate over arrays as sequentially as possible. If you have a 2D array to iterate over, then the inner loop should iterate over the last dimension:

for (int y = 0; y < h; ++y)
  for (int x = 0; x < w; ++x)
    do_something(array[y][x]);

This way, the array is accessed sequentially, and the cache is used optimally. If you access the array the other way around, the code jumps back and forth in memory, and depending on the size of the array, each jump may result in a cache miss.

---
Me make music: Triofobie
---
"We need Tobias and his awesome trombone, too." - Johan Halmén

Audric
Member #907
January 2001

A honest question: According to some googling, I can expect page size of 4096 on Windows. If all the data I access happens to be in the same page (say, in the 2kb in the middle of a page), does it still matter ?

type568
Member #8,381
March 2007
avatar

This way, the array is accessed sequentially, and the cache is used optimally. If you access the array the other way around, the code jumps back and forth in memory, and depending on the size of the array, each jump may result in a cache miss.

I think it sounds more evil than it is, but yeah. That's the only thing that came in to my mind too, when thinking of performance. Though it has nothing to do with the original question.

Audric said:

A honest question: According to some googling, I can expect page size of 4096 on Windows. If all the data I access happens to be in the same page (say, in the 2kb in the middle of a page), does it still matter ?

If you have 2kb, you shouldn't really care about it, unless it's many many times of 2kb. Or I might be literally wrong. If you could give more specific example, I think you could get a more detailed answer.

Audric
Member #907
January 2001

A 2-level nested loop is very often used to scan a 2D array, so I think that it really matters (what's inside the nested loop, and how your data is organized)

type568
Member #8,381
March 2007
avatar

Audric said:

A 2-level nested loop is very often used to scan a 2D array, so I think that it really matters (what's inside the nested loop, and how your data is organized)

Yeah, it's what Tobias said..

Audric said:

A honest question: According to some googling, I can expect page size of 4096 on Windows. If all the data I access happens to be in the same page (say, in the 2kb in the middle of a page), does it still matter ?

Perhaps I misunderstood the question.
I believe, there maybe a case when it does, however obviously the importance(the performance difference) will be greatly decreased, and it will be uncignificant enough to be ignored.. But I think it's easier to remember about the loops order rather than break your head about the size of the accessed data.

Evert
Member #794
November 2000
avatar

type568 said:

I think it sounds more evil than it is

Cache misses are an absolute killer. You can easily loose a factor 2 or more in speed.
By the way, this is the processor cache, which is a lot smaller than your main memory (larger than 4k though). Main memory page boundaries or Windows have nothing to do with it.

Audric
Member #907
January 2001

Ok so I guess my 16Mb lookup array for RGB->8bit color reduction was a bad idea ;D

Tobias Dammers
Member #2,604
August 2002
avatar

Audric said:

Ok so I guess my 16Mb lookup array for RGB->8bit color reduction was a bad idea

Yes, because it doesn't take the alpha channel into account. And we're not even talking about blending - a lookup table for truecolor alpha blending, now that would be lightning fast!

---
Me make music: Triofobie
---
"We need Tobias and his awesome trombone, too." - Johan Halmén

type568
Member #8,381
March 2007
avatar

Yes, because it doesn't take the alpha channel into account. And we're not even talking about blending - a lookup table for truecolor alpha blending, now that would be lightning fast!

Guess I've lost my age to learn Chinese.. :(

Audric
Member #907
January 2001

type568: There are 256x256x256 colors possible in RGB, I use an array of this size (224 bytes) to look up a 0-255 color. And I access this array for each pixel of a photo, so many times...
Tobias suggested adding Alpha channel to the mix (232 bytes), or a "2d" array which stores the result of mixing 2 colors, with one color in X and one color in y. 264 bytes

Tobias Dammers
Member #2,604
August 2002
avatar

Audric said:

264 bytes

Which happens to be the entire address space on a 64 bit system. Even if you could have that much memory, you wouldn't be able to fit in another 8 bytes for the actual pixels you're trying to blend.
In other news, I was being sarcastic.

---
Me make music: Triofobie
---
"We need Tobias and his awesome trombone, too." - Johan Halmén

Audric
Member #907
January 2001

Well, I started with my array (which is actual running code)

type568
Member #8,381
March 2007
avatar

Audric said:

There are 256x256x256 colors possible in RGB, I use an array of this size (224 bytes) to look up a 0-255 color.

Not very cheap, but affordable.. 16MiB. The question is if it's any faster.. Some extra CPU cycles to save RAM access might not be as expensive.

Go to: