|
CPU family compilation of allegro? |
Bob
Free Market Evangelist
September 2000
|
If we eliminate draw_compiled_sprite (which doesn't count when doing a C/asm comparison), and the 24-bit formats that the compiler has trouble with, it looks like C-only is a net win for 32 and 15/16-bpp. That's awesome! -- |
Elias
Member #358
May 2000
|
Yes. It means, we can safely remove all the asm in the 4.3.x tree. So, with all the djgpp stuff gone and the asm stuff gone, the 4.3.0 Allegro code will not be recognizable at all.. from big and hackish it should go to really nice -- |
HoHo
Member #4,534
April 2004
|
Please don't remove compiled sprites, if one (me) needs absolute speed of blitting then they are the way to go. Anything else you can do whatever you want. Someone with a bit better ASM knowledge should find out what makes some functions slower in c than in asm, especially in 24/8 bit depth. Also there should be a bit more tests before we could make any assumptions about speed. E.g I don't know how fast msvc is. If it's considerably worse in optimizing then some might not like if asm routines are gone. __________ |
Michael Jensen
Member #2,870
October 2002
|
Ummm, on the test results, draw_compiled_sprite ASM still kicked it's c counter parts ass, and both draw_rle_sprites anyway, with the argument you made, I think you'd still be stuck using draw_compiled_sprite... I could probably put together a 486, I've got enough junk that needs to be tossed out, but I doubt I will... All I'm saying is, is that at one point, the asm had to be faster than the c, why else would they have used it? it could be maybe that this is only on older processors, or perhaps, on older compilers that didn't optimize as well... anyway...
|
Elias
Member #358
May 2000
|
Yes. I'm not too convinced yet as well. It must be only about a year since we added the asm color converters to unix, and from what I remember, they brought some speed gain. So not too sure.. but if further tests all show that the C versions are faster - we'll remove the asm. About compiled sprites (as well as RLE sprites) - I don't have the impression they are that fast, especially when not used on memory bitmaps. And remember, for 4.3.x, we also want to complete the AGL vtable, so for system which have OpenGL available, we will have a lot more HW accelerated stuff than currently as well. -- |
Matt Smith
Member #783
November 2000
|
My attempts at installing 98SE on my new machine have failed, so I can't test my old 8 bit asm blitting on a (relatively) modern Sempron. It was clearly 3 times faster than the equivalent code in C on my Celeron. The 16 bit test was 1.5 times faster, and the 32 bit was the same. This was code written 386 style, with no regard for pipeline stalls or cache, and compiled with gcc 2.95. I think gcc traditionally has a problem with 8 & 16 bit, but maybe it is getting better. I am no longer in favour of supporting DOS with 4.3. 4.0 is the finished article for DOS. Most of the new features don't make any sense on a single-tasking non-accelerated machine anyway. I'm basing the console ports on 4.0.3, rather than WIP, originally to fix the feature set (which needs further trimming for GBA and PS1) but now that 4.2 doesn't compile with gcc 2.95 anymore I'm glad I did, because gcc 2.95 is all you can get for most of these consoles. |
Evert
Member #794
November 2000
|
Quote: now that 4.2 doesn't compile with gcc 2.95 anymore Actually, I think that's a bug. Not sure how easy it will be to fix though. About C vs asm, I'd be interested to know why the asm code performs so much worse before tearing it out (although my own interest in it is minimal, since it doesn't compile on an AMD64 in native mode anyway). |
Richard Phipps
Member #1,632
November 2001
|
Now that PC's are getting more powerful I think we could sacrifice some speed if C code would be easier to maintain and develop. I could understand why for 386 & 486's it was very useful to get the best speed possible, but I don't think this is so important now. |
ReyBrujo
Moderator
January 2001
|
That specific ASM code has became obsolete, that is what it means. The compiler is able to find much better optimizations than the ones used in those files. -- |
Elias
Member #358
May 2000
|
Yeah. If someone contributes optimized asm code, we certainly will keep it, or even put it in again during 4.3.x after the old was thrown out -- |
Richard Phipps
Member #1,632
November 2001
|
So is this a conscious decision to leave 4.0.3 more suitable for DOS and older PC's, and move the 4.2/4.3 branches into being optimised or aimed more for modern PC's? |
Elias
Member #358
May 2000
|
4.2.0 will fully support DOS, and also have all the asm code (but maybe the C version should get default?). The next major release will be 4.4.0, which won't support DOS, and have the new gfx API and events API and so on - but with a compatibility layer so (most) 4.2.0 programs will also work with 4.4.0. 4.3.x are the new WIP versions after 4.2.0 has been released. About the decision.. whatever will be done will be done There never was a decision to remove DOS support, but 4.3.x in CVS doesn't compile with djgpp, and nobody is interested in fixing it - so that's how the support was dropped. In the asm case it's different, we maybe need to decide to remove it. But I guess, we can just wait until someone implements the new gfx api vtable entries.. which will be written in C. If there's nobody to write asm versions (and the current asm routines apparently are not worth porting) - asm will have been dropped. -- |
Richard Phipps
Member #1,632
November 2001
|
Thanks for that Elias. Oh, does the newer versions of GCC produce significantly better optimisations than 2.95? I wonder how close the compiled C code is to any new handrwritten optimised ASM? |
Michael Jensen
Member #2,870
October 2002
|
Quote: Now that PC's are getting more powerful I think we could sacrifice some speed if C code would be easier to maintain and develop. I completly agree with this... especially after reading Evert's AMD64 statement. Quote: does the newer versions of GCC produce significantly better optimisations than 2.95? I have no actual education on this topic, but I would think that it's a combination of things: New hardware architectures, so old optimized asm doesn't work as well... Newer compilers that optimize better than the old ones... and optimize better for newer architecture than old ones did for the old architectures.... (generally anyway, I'm thinking about more than just GCC 2.95, etc) edit:
|
Peter Wang
Member #23
April 2000
|
Quote: I'm seriously thinking in modifying allegro test so that user only executes it and it runs most of the test(or only selected ones if I do it) so that it wouldn't be such a pain to test different settings. HoHo, please do! The more and better tools we have, the more often we will bother to do profiling.
|
Kitty Cat
Member #2,815
October 2002
|
Quote: not sure, anyway, could it be that code produced with this specific version of the library wouldn't even run on 3/486s? If it's compiled with -mcpu or -mtune, it will still run on 3/486s. But if it's -march, it'll require the specified CPU or better. By default, I believe Allegro compiles with -mcpu=pentium (so it's Pentium-optimized, but will work on older CPUs). Personally myself, I compile Allegro with -march=athlon-tbird -mmmx -m3dnow (plus some GCC 3.4-or-better switches, like -fweb -frename-registers -funswitch-loops), so it'll require an Athlon-TBird or better, with MMX and 3DNow. Since I'm using the Linux shared lib though, it won't be distributed (or if it ever is, I can build a more compatible version). -- |
Michael Jensen
Member #2,870
October 2002
|
Kitty: If you don't specify march options, will you get any kind of performance increase for having those technologies? (3dnow, mmx, etc) as opposed to someone who doesn't? or does the extra hardware sit dormant? also: do all modern* computers have mmx? (modern=high end p2+)
|
HoHo
Member #4,534
April 2004
|
I think in intel line, Pentium1 was the last one that all models didn't have mmx. __________ |
Evert
Member #794
November 2000
|
Quote: Now that PC's are getting more powerful I think we could sacrifice some speed if C code would be easier to maintain and develop. Well, it certainly helps portability... might be something to consider I suppose. Quote: There never was a decision to remove DOS support, but 4.3.x in CVS doesn't compile with djgpp, and nobody is interested in fixing it - so that's how the support was dropped.
Actually, some people seem to be interested. If they should choose to maintain the DJGPP port and keep it up to date with the rest, then I see no reason to say `no, we want to drop this'. |
HoHo
Member #4,534
April 2004
|
[quote Peter Wang]HoHo, please do! The more and better tools we have, the more often we will bother to do profiling.<quote>is it OK if I don't follow the usual manner of Allegro examples and tests that there are so far? I would like to use some c++ features (STL and perhaps a bit OO) and split it between several files. Also, may I use some better Gui library than Allegro builtin one? __________ |
Evert
Member #794
November 2000
|
Quote: is it OK if I don't follow the usual manner of Allegro examples and tests that there are so far? Depends a bit on to what extend they're different. If we want to fit it into Allegro itself, then I tend to say no. Quote: I would like to use some c++ features (STL and perhaps a bit OO) and split it between several files. May or may not be ok, as far as the C++ part goes. I would personally say no here, but others are free to disagree of course. Quote: Also, may I use some better Gui library than Allegro builtin one?
What on Earth for? A test programme doesn't really need a neat looking GUI, the standard Allegro one should be fine. That said, do you want it to be a part of Allegro? You could release it as a seperate package. That said, I probably would not use it myself very often if I have to recompile it (and addon libraries) everytime I want to check something that was just changed within Allegro. |
HoHo
Member #4,534
April 2004
|
Ok, I think I can do it with c and allegro gui but I would really want to split it to several files. I don't like to scroll around in a 4500+ line file to find something. I don't care if it's included or not it in allegro, it's up to the ones who decide stuff like this. But before deciding anything I first try actually doing it and then we'll see how good it really comes out. __________ |
Thomas Fjellstrom
Member #476
June 2000
|
If it were apart of the library, as in part of the API, I'd say no C++, but as its a tool, who cares? -- |
Evert
Member #794
November 2000
|
Quote: I would really want to split it to several files My no was to the C++ part, not the multiple files part. |
HoHo
Member #4,534
April 2004
|
[EDIT3] I know, it's just that so far all allegro examples and demo has been in single files (there were talks that new demos will be seperate). [edit] I've got some new benchmarking results, this time from a pentium mmx 200@225, Gentoo, gcc 3.3.4 p200@225 Mmx X with KDE i386-pc-linux-gnu-3.3.4 800x600 Difference, Difference, Difference, Difference, 32 bit >0 means 24bit >0 means 16bit >0 means 8bit >0 means C ASM c is faster C Asm c is faste C Asm c is faste C Asm c is faste textout() 12407 12381 0.21 6350 6604 -3.85 11386 13061 -12.82 12167 14424 -15.65 blit() from memory 7777 7499 3.71 3035 6557 -53.71 8801 17335 -49.23 7696 17453 -55.9 masked_blit() from memory 10692 9175 16.53 4199 4727 -11.17 9693 11086 -12.57 9763 16606 -41.21 draw_sprite() 23794 19076 24.73 10843 13262 -18.24 20111 19675 2.22 23727 25422 -6.67 draw_rle_sprite() 24229 24977 -2.99 15125 19547 -22.62 26824 29598 -9.37 24799 18234 36 draw_compiled_sprite() 24020 33138 -27.52 15758 29037 -45.73 27069 49774 -45.62 25029 15631 60.12 draw_trans_sprite() 7928 3791 109.13 5473 3889 40.73 6713 4992 34.48 14308 26334 -45.67 draw_trans_rle_sprite() 7019 4131 69.91 6677 4379 52.48 7167 5598 28.03 15138 13690 10.58 draw_lit_sprite() 7575 3826 97.99 5543 3786 46.41 7686 4929 55.93 22386 24294 -7.85 draw_lit_rle_sprite() 7392 4279 72.75 6107 4349 40.42 7356 5692 29.23 21654 29968 -27.74 More bitdepths to follow. I think it's pretty safe to make c the default compile target if other platforms/compilers show similar results [edit2] Added other depths too. As it seems other bitdepths don't benefit as much from c-only solution I guess lack of branch prediction and inorder execution give the biggest hit. Anyone got a p2 to run tests trans/lit ones seem to get speed boost almost everywhere whereas 24bit is much slower in general blitting/sprites. I wonder, how hard it would be to find out CPU type runtime and modify vtables accordingly to get fastest function possible. Probably not so easy and it gets questionable once OpenGL is integrated, __________ |
|
|