Which one is faster

Which one is faster

Shravan

Member #10,724

February 2009

for(i=0;i<100;i++)
{
   for(j=0;j<10;j++)
  {
    //Some piece of code
  }
}

for(i=0;i<10;i++)
{
   for(j=0;j<100;j++)
  {
    //Same code as above
  }
}

SiegeLord

Member #7,827

October 2006

Any modern compiler will rewrite one as the other as needed.

Don't worry about it.

"For in much wisdom is much grief: and he that increases knowledge increases sorrow."-Ecclesiastes 1:18
[SiegeLord's Abode][Codes]:[DAllegro5]:[RustAllegro]

OICW

Member #4,069

November 2003

Both run at O(M*N) so it doesn't matter. And in case of some optimisations, it doesn't really matter that much, compiler will do what he deems necessary and most of the time doing some optimisations manually is contraproductive.

[My website][CppReference][Pixelate][Allegators worldwide][Who's online]
"Final Fantasy XIV, I feel that anything I could say will be repeating myself, so I'm just gonna express my feelings with a strangled noise from the back of my throat. Graaarghhhh..." - Yahtzee
"Uhm... this is a.cc. Did you honestly think this thread WOULDN'T be derailed and ruined?" - BAF
"You can discuss it, you can dislike it, you can disagree with it, but that's all what you can do with it"

anonymous

Member #8025

November 2006

If "some piece of code" is accessing elements in a 2d array, then it might be wiser to loop over it so as to access elements sequentially. Otherwise it might be clearer just to loop 1000 times.

Goalie Ca

Member #2,579

July 2002

What matter is how you are touching the memory. Performance nowadays pretty much boils down to cache usage. If you can't make good use of the l2 cache then you're toast.

-------------
Bah weep granah weep nini bong!

Oscar Giner

Member #2,207

April 2002

Assuming that the order at wich you do the iterations doesn't matter, it's faster to do it reverse:

for(i=99;i>=0;--i)
{
   for(j=9;j>=0;--j)
  {
    //Some piece of code
  }
}

CPU's only have comparison instructions against 0. To compare against a non zero value you need to add an extra substraction instruction and compare against 0 (i<100 is transformed to i-100<0). Of course the performance gain will be negligible in 99.99% of cases and you lose readability in your code

--
[Website | e-mail]
[Tetris Unlimited] [AllegAVI | AlText]

type568

Member #8,381

March 2007

Oscar Giner said:

Point me at a modern CPU failing to complete a comparison of two non zero integers in a single cycle.

Append:

Goalie Ca

Member #2,579

July 2002

type568 said:

Point me at a modern CPU failing to complete a comparison of two non zero integers in a single cycle.

L1 cache (1 or 2 cycles)
L2 cache (10s cycles)
Ram (100s cycles)
Branch misprediction (10s of cycles)

IIRC, modern chips use memory locality and work best at prefetching when traversing from front to back.

-------------
Bah weep granah weep nini bong!

type568

Member #8,381

March 2007

Goalie Ca said:

L1 cache (1 or 2 cycles)
L2 cache (10s cycles)
Ram (100s cycles)
Branch misprediction (10s of cycles)

Data retrieving isn't comparison. Also it's obvious, that my statement was regarding the difference of comparison vs zero or vs other value. In both cases, they gotta be retrieved(since CPU can't know there's the zero, I bet it also has to be retrieved..).

And when all the data is finally in registers, it'll be single cycle for both of'em.

Tobias Dammers

Member #2,604

August 2002

The most important rule about code optimization: Don't do it.

Anyway: As Goalie pointed out, the only thing that matters in this context is cache performance, and this means that you should try to iterate over arrays as sequentially as possible. If you have a 2D array to iterate over, then the inner loop should iterate over the last dimension:

for (int y = 0; y < h; ++y)
  for (int x = 0; x < w; ++x)
    do_something(array[y][x]);

This way, the array is accessed sequentially, and the cache is used optimally. If you access the array the other way around, the code jumps back and forth in memory, and depending on the size of the array, each jump may result in a cache miss.

---
Me make music: Triofobie
---
"We need Tobias and his awesome trombone, too." - Johan Halmén

Audric

Member #907

January 2001

A honest question: According to some googling, I can expect page size of 4096 on Windows. If all the data I access happens to be in the same page (say, in the 2kb in the middle of a page), does it still matter ?

type568

Member #8,381

March 2007

Tobias Dammers said:

I think it sounds more evil than it is, but yeah. That's the only thing that came in to my mind too, when thinking of performance. Though it has nothing to do with the original question.

Audric said:

If you have 2kb, you shouldn't really care about it, unless it's many many times of 2kb. Or I might be literally wrong. If you could give more specific example, I think you could get a more detailed answer.

Audric

Member #907

January 2001

A 2-level nested loop is very often used to scan a 2D array, so I think that it really matters (what's inside the nested loop, and how your data is organized)

type568

Member #8,381

March 2007

Audric said:

A 2-level nested loop is very often used to scan a 2D array, so I think that it really matters (what's inside the nested loop, and how your data is organized)

Yeah, it's what Tobias said..

Audric said:

Perhaps I misunderstood the question.
I believe, there maybe a case when it does, however obviously the importance(the performance difference) will be greatly decreased, and it will be uncignificant enough to be ignored.. But I think it's easier to remember about the loops order rather than break your head about the size of the accessed data.

Evert

Member #794

November 2000

type568 said:

I think it sounds more evil than it is

Cache misses are an absolute killer. You can easily loose a factor 2 or more in speed.
By the way, this is the processor cache, which is a lot smaller than your main memory (larger than 4k though). Main memory page boundaries or Windows have nothing to do with it.

Audric

Member #907

January 2001

Ok so I guess my 16Mb lookup array for RGB->8bit color reduction was a bad idea

Tobias Dammers

Member #2,604

August 2002

Audric said:

Ok so I guess my 16Mb lookup array for RGB->8bit color reduction was a bad idea

Yes, because it doesn't take the alpha channel into account. And we're not even talking about blending - a lookup table for truecolor alpha blending, now that would be lightning fast!

---
Me make music: Triofobie
---
"We need Tobias and his awesome trombone, too." - Johan Halmén

type568

Member #8,381

March 2007

Tobias Dammers said:

Yes, because it doesn't take the alpha channel into account. And we're not even talking about blending - a lookup table for truecolor alpha blending, now that would be lightning fast!

Guess I've lost my age to learn Chinese..

Audric

Member #907

January 2001

type568: There are 256x256x256 colors possible in RGB, I use an array of this size (2²⁴ bytes) to look up a 0-255 color. And I access this array for each pixel of a photo, so many times...
Tobias suggested adding Alpha channel to the mix (2³² bytes), or a "2d" array which stores the result of mixing 2 colors, with one color in X and one color in y. 2⁶⁴ bytes

Tobias Dammers

Member #2,604

August 2002

Audric said:

2⁶⁴ bytes

Which happens to be the entire address space on a 64 bit system. Even if you could have that much memory, you wouldn't be able to fit in another 8 bytes for the actual pixels you're trying to blend.
In other news, I was being sarcastic.

---
Me make music: Triofobie
---
"We need Tobias and his awesome trombone, too." - Johan Halmén

Audric

Member #907

January 2001

Well, I started with my array (which is actual running code)

type568

Member #8,381

March 2007

Audric said:

There are 256x256x256 colors possible in RGB, I use an array of this size (224 bytes) to look up a 0-255 color.

Not very cheap, but affordable.. 16MiB. The question is if it's any faster.. Some extra CPU cycles to save RAM access might not be as expensive.