Allegro.cc - Online Community

Allegro.cc Forums » Allegro Development » CPU family compilation of allegro?

This thread is locked; no one can reply to it. rss feed Print
CPU family compilation of allegro?
Bob
Free Market Evangelist
September 2000
avatar

If we eliminate draw_compiled_sprite (which doesn't count when doing a C/asm comparison), and the 24-bit formats that the compiler has trouble with, it looks like C-only is a net win for 32 and 15/16-bpp.

That's awesome!

--
- Bob
[ -- All my signature links are 404 -- ]

Elias
Member #358
May 2000

Yes. It means, we can safely remove all the asm in the 4.3.x tree. So, with all the djgpp stuff gone and the asm stuff gone, the 4.3.0 Allegro code will not be recognizable at all.. from big and hackish it should go to really nice :)

--
"Either help out or stop whining" - Evert

HoHo
Member #4,534
April 2004
avatar

Please don't remove compiled sprites, if one (me) needs absolute speed of blitting then they are the way to go. Anything else you can do whatever you want.

Someone with a bit better ASM knowledge should find out what makes some functions slower in c than in asm, especially in 24/8 bit depth.

Also there should be a bit more tests before we could make any assumptions about speed. E.g I don't know how fast msvc is. If it's considerably worse in optimizing then some might not like if asm routines are gone.

__________
In theory, there is no difference between theory and practice. But, in practice, there is - Jan L.A. van de Snepscheut
MMORPG's...Many Men Online Role Playing Girls - Radagar
"Is Java REALLY slower? Does STL really bloat your exes? Find out with your friendly host, HoHo, and his benchmarking machine!" - Jakub Wasilewski

Michael Jensen
Member #2,870
October 2002
avatar

Ummm, on the test results, draw_compiled_sprite ASM still kicked it's c counter parts ass, and both draw_rle_sprites anyway, with the argument you made, I think you'd still be stuck using draw_compiled_sprite... I could probably put together a 486, I've got enough junk that needs to be tossed out, but I doubt I will... All I'm saying is, is that at one point, the asm had to be faster than the c, why else would they have used it? it could be maybe that this is only on older processors, or perhaps, on older compilers that didn't optimize as well... anyway...

Elias
Member #358
May 2000

Yes. I'm not too convinced yet as well. It must be only about a year since we added the asm color converters to unix, and from what I remember, they brought some speed gain. So not too sure.. but if further tests all show that the C versions are faster - we'll remove the asm.

About compiled sprites (as well as RLE sprites) - I don't have the impression they are that fast, especially when not used on memory bitmaps. And remember, for 4.3.x, we also want to complete the AGL vtable, so for system which have OpenGL available, we will have a lot more HW accelerated stuff than currently as well.

--
"Either help out or stop whining" - Evert

Matt Smith
Member #783
November 2000

My attempts at installing 98SE on my new machine have failed, so I can't test my old 8 bit asm blitting on a (relatively) modern Sempron. It was clearly 3 times faster than the equivalent code in C on my Celeron. The 16 bit test was 1.5 times faster, and the 32 bit was the same.

This was code written 386 style, with no regard for pipeline stalls or cache, and compiled with gcc 2.95. I think gcc traditionally has a problem with 8 & 16 bit, but maybe it is getting better.

I am no longer in favour of supporting DOS with 4.3. 4.0 is the finished article for DOS. Most of the new features don't make any sense on a single-tasking non-accelerated machine anyway.

I'm basing the console ports on 4.0.3, rather than WIP, originally to fix the feature set (which needs further trimming for GBA and PS1) but now that 4.2 doesn't compile with gcc 2.95 anymore I'm glad I did, because gcc 2.95 is all you can get for most of these consoles.

Evert
Member #794
November 2000
avatar

Quote:

now that 4.2 doesn't compile with gcc 2.95 anymore

Actually, I think that's a bug. Not sure how easy it will be to fix though.

About C vs asm, I'd be interested to know why the asm code performs so much worse before tearing it out (although my own interest in it is minimal, since it doesn't compile on an AMD64 in native mode anyway).

Richard Phipps
Member #1,632
November 2001
avatar

Now that PC's are getting more powerful I think we could sacrifice some speed if C code would be easier to maintain and develop.

I could understand why for 386 & 486's it was very useful to get the best speed possible, but I don't think this is so important now.

ReyBrujo
Moderator
January 2001
avatar

That specific ASM code has became obsolete, that is what it means. The compiler is able to find much better optimizations than the ones used in those files.

--
RB
光子「あたしただ…奪う側に回ろうと思っただけよ」
Mitsuko's last words, Battle Royale

Elias
Member #358
May 2000

Yeah. If someone contributes optimized asm code, we certainly will keep it, or even put it in again during 4.3.x after the old was thrown out :)

--
"Either help out or stop whining" - Evert

Richard Phipps
Member #1,632
November 2001
avatar

So is this a conscious decision to leave 4.0.3 more suitable for DOS and older PC's, and move the 4.2/4.3 branches into being optimised or aimed more for modern PC's?

Elias
Member #358
May 2000

4.2.0 will fully support DOS, and also have all the asm code (but maybe the C version should get default?). The next major release will be 4.4.0, which won't support DOS, and have the new gfx API and events API and so on - but with a compatibility layer so (most) 4.2.0 programs will also work with 4.4.0. 4.3.x are the new WIP versions after 4.2.0 has been released.

About the decision.. whatever will be done will be done :)

There never was a decision to remove DOS support, but 4.3.x in CVS doesn't compile with djgpp, and nobody is interested in fixing it - so that's how the support was dropped.

In the asm case it's different, we maybe need to decide to remove it. But I guess, we can just wait until someone implements the new gfx api vtable entries.. which will be written in C. If there's nobody to write asm versions (and the current asm routines apparently are not worth porting) - asm will have been dropped.

--
"Either help out or stop whining" - Evert

Richard Phipps
Member #1,632
November 2001
avatar

Thanks for that Elias. Oh, does the newer versions of GCC produce significantly better optimisations than 2.95? I wonder how close the compiled C code is to any new handrwritten optimised ASM?

Michael Jensen
Member #2,870
October 2002
avatar

Quote:

Now that PC's are getting more powerful I think we could sacrifice some speed if C code would be easier to maintain and develop.

I completly agree with this... especially after reading Evert's AMD64 statement.

Quote:

does the newer versions of GCC produce significantly better optimisations than 2.95?

I have no actual education on this topic, but I would think that it's a combination of things: New hardware architectures, so old optimized asm doesn't work as well... Newer compilers that optimize better than the old ones... and optimize better for newer architecture than old ones did for the old architectures.... (generally anyway, I'm thinking about more than just GCC 2.95, etc)

edit:
also, while compiling/installing allegro 420, I noticed a lot of switches with 586 in them, possibly i586? not sure, anyway, could it be that code produced with this specific version of the library wouldn't even run on 3/486s?

Peter Wang
Member #23
April 2000

Quote:

I'm seriously thinking in modifying allegro test so that user only executes it and it runs most of the test(or only selected ones if I do it) so that it wouldn't be such a pain to test different settings.

HoHo, please do! The more and better tools we have, the more often we will bother to do profiling.

Kitty Cat
Member #2,815
October 2002
avatar

Quote:

not sure, anyway, could it be that code produced with this specific version of the library wouldn't even run on 3/486s?

If it's compiled with -mcpu or -mtune, it will still run on 3/486s. But if it's -march, it'll require the specified CPU or better. By default, I believe Allegro compiles with -mcpu=pentium (so it's Pentium-optimized, but will work on older CPUs). Personally myself, I compile Allegro with -march=athlon-tbird -mmmx -m3dnow (plus some GCC 3.4-or-better switches, like -fweb -frename-registers -funswitch-loops), so it'll require an Athlon-TBird or better, with MMX and 3DNow. Since I'm using the Linux shared lib though, it won't be distributed (or if it ever is, I can build a more compatible version).

--
"Do not meddle in the affairs of cats, for they are subtle and will pee on your computer." -- Bruce Graham

Michael Jensen
Member #2,870
October 2002
avatar

Kitty: If you don't specify march options, will you get any kind of performance increase for having those technologies? (3dnow, mmx, etc) as opposed to someone who doesn't? or does the extra hardware sit dormant?

also: do all modern* computers have mmx? (modern=high end p2+)

HoHo
Member #4,534
April 2004
avatar

I think in intel line, Pentium1 was the last one that all models didn't have mmx.

__________
In theory, there is no difference between theory and practice. But, in practice, there is - Jan L.A. van de Snepscheut
MMORPG's...Many Men Online Role Playing Girls - Radagar
"Is Java REALLY slower? Does STL really bloat your exes? Find out with your friendly host, HoHo, and his benchmarking machine!" - Jakub Wasilewski

Evert
Member #794
November 2000
avatar

Quote:

Now that PC's are getting more powerful I think we could sacrifice some speed if C code would be easier to maintain and develop.

Well, it certainly helps portability... might be something to consider I suppose.

Quote:

There never was a decision to remove DOS support, but 4.3.x in CVS doesn't compile with djgpp, and nobody is interested in fixing it - so that's how the support was dropped.

Actually, some people seem to be interested. If they should choose to maintain the DJGPP port and keep it up to date with the rest, then I see no reason to say `no, we want to drop this'.
Which reminds me, is the old Mac version still distributed along with the rest? Does that even compile? I suspect no one has actually touched it in five years or so...

HoHo
Member #4,534
April 2004
avatar

[quote Peter Wang]HoHo, please do! The more and better tools we have, the more often we will bother to do profiling.<quote>is it OK if I don't follow the usual manner of Allegro examples and tests that there are so far? I would like to use some c++ features (STL and perhaps a bit OO) and split it between several files. Also, may I use some better Gui library than Allegro builtin one?

__________
In theory, there is no difference between theory and practice. But, in practice, there is - Jan L.A. van de Snepscheut
MMORPG's...Many Men Online Role Playing Girls - Radagar
"Is Java REALLY slower? Does STL really bloat your exes? Find out with your friendly host, HoHo, and his benchmarking machine!" - Jakub Wasilewski

Evert
Member #794
November 2000
avatar

Quote:

is it OK if I don't follow the usual manner of Allegro examples and tests that there are so far?

Depends a bit on to what extend they're different. If we want to fit it into Allegro itself, then I tend to say no.

Quote:

I would like to use some c++ features (STL and perhaps a bit OO) and split it between several files.

May or may not be ok, as far as the C++ part goes. I would personally say no here, but others are free to disagree of course.

Quote:

Also, may I use some better Gui library than Allegro builtin one?

What on Earth for? A test programme doesn't really need a neat looking GUI, the standard Allegro one should be fine.
Certainly if we want it to be part of Allegro it should have no external dependencies other than Allegro itself.

That said, do you want it to be a part of Allegro? You could release it as a seperate package. That said, I probably would not use it myself very often if I have to recompile it (and addon libraries) everytime I want to check something that was just changed within Allegro.

HoHo
Member #4,534
April 2004
avatar

Ok, I think I can do it with c and allegro gui but I would really want to split it to several files. I don't like to scroll around in a 4500+ line file to find something.

I don't care if it's included or not it in allegro, it's up to the ones who decide stuff like this. But before deciding anything I first try actually doing it and then we'll see how good it really comes out.

__________
In theory, there is no difference between theory and practice. But, in practice, there is - Jan L.A. van de Snepscheut
MMORPG's...Many Men Online Role Playing Girls - Radagar
"Is Java REALLY slower? Does STL really bloat your exes? Find out with your friendly host, HoHo, and his benchmarking machine!" - Jakub Wasilewski

Thomas Fjellstrom
Member #476
June 2000
avatar

If it were apart of the library, as in part of the API, I'd say no C++, but as its a tool, who cares?

--
Thomas Fjellstrom - [website] - [email] - [Allegro Wiki] - [Allegro TODO]
"If you can't think of a better solution, don't try to make a better solution." -- weapon_S
"The less evidence we have for what we believe is certain, the more violently we defend beliefs against those who don't agree" -- https://twitter.com/neiltyson/status/592870205409353730

Evert
Member #794
November 2000
avatar

Quote:

I would really want to split it to several files

My no was to the C++ part, not the multiple files part.

HoHo
Member #4,534
April 2004
avatar

[EDIT3]
The results might be wrong. I'm guiding my brother via msn how to do them. He redid c 24bit version and got some results 2-4x faster and others 2*4x slower. I'm investigating this
[edit4]
That's weird. My brother ran 24bit tests once again and now c version is ~10% slower than before whereas asm is ~10% faster than before. I think on a computer so slow any kind of activity drags results down. I guess I can't get any good measurements until weekend when I go home.

I know, it's just that so far all allegro examples and demo has been in single files (there were talks that new demos will be seperate).

[edit]

I've got some new benchmarking results, this time from a pentium mmx 200@225, Gentoo, gcc 3.3.4

p200@225 Mmx                   X with KDE
i386-pc-linux-gnu-3.3.4           800x600            Difference,                      Difference,                      Difference,                      Difference,
                                   32 bit               >0 means      24bit              >0 means      16bit              >0 means       8bit              >0 means
                                        C        ASM c is faster          C        Asm c is faste          C        Asm c is faste          C        Asm c is faste
textout()                           12407      12381        0.21       6350       6604      -3.85      11386      13061     -12.82      12167      14424     -15.65
blit() from memory                   7777       7499        3.71       3035       6557     -53.71       8801      17335     -49.23       7696      17453      -55.9
masked_blit() from memory           10692       9175       16.53       4199       4727     -11.17       9693      11086     -12.57       9763      16606     -41.21
draw_sprite()                       23794      19076       24.73      10843      13262     -18.24      20111      19675       2.22      23727      25422      -6.67
draw_rle_sprite()                   24229      24977       -2.99      15125      19547     -22.62      26824      29598      -9.37      24799      18234         36
draw_compiled_sprite()              24020      33138      -27.52      15758      29037     -45.73      27069      49774     -45.62      25029      15631      60.12
draw_trans_sprite()                  7928       3791      109.13       5473       3889      40.73       6713       4992      34.48      14308      26334     -45.67
draw_trans_rle_sprite()              7019       4131       69.91       6677       4379      52.48       7167       5598      28.03      15138      13690      10.58
draw_lit_sprite()                    7575       3826       97.99       5543       3786      46.41       7686       4929      55.93      22386      24294      -7.85
draw_lit_rle_sprite()                7392       4279       72.75       6107       4349      40.42       7356       5692      29.23      21654      29968     -27.74

More bitdepths to follow.

I think it's pretty safe to make c the default compile target if other platforms/compilers show similar results

[edit2]

Added other depths too.

As it seems other bitdepths don't benefit as much from c-only solution :-/ I guess lack of branch prediction and inorder execution give the biggest hit. Anyone got a p2 to run tests ;D

trans/lit ones seem to get speed boost almost everywhere whereas 24bit is much slower in general blitting/sprites.

I wonder, how hard it would be to find out CPU type runtime and modify vtables accordingly to get fastest function possible. Probably not so easy and it gets questionable once OpenGL is integrated,

__________
In theory, there is no difference between theory and practice. But, in practice, there is - Jan L.A. van de Snepscheut
MMORPG's...Many Men Online Role Playing Girls - Radagar
"Is Java REALLY slower? Does STL really bloat your exes? Find out with your friendly host, HoHo, and his benchmarking machine!" - Jakub Wasilewski



Go to: