![]() |
|
CPU family compilation of allegro? |
A J
Member #3,025
December 2002
![]() |
as the asm code is now possibly the cause of slow code, it might be time to review its usefulness. ___________________________ |
HoHo
Member #4,534
April 2004
![]() |
./configure --enable-asm=no [edit] Someone with windows access should compile several versions of allegro and test(static linked). Then several people should run these on different machines to see how big of a speed gain/loss is it to use c-only vs asm. __________ |
gillius
Member #119
April 2000
|
I'd like to see HoHo's test run on MSVC 7.1 with SSE or SSE2 enabled. I have personally seen in generate SSE asm code for floating point operations (are they used in Allegro, even?). In my Direct3D game, IIRC, SSE2 increased the performance by 10-20%, but the game was heavy on vector math, which is what SSE was meant for. Gillius |
HoHo
Member #4,534
April 2004
![]() |
Quote: I'd like to see HoHo's test run on MSVC 7.1 with SSE or SSE2 enabled. I don't quite get what you mean by this. Do you want me to run allegro test compiled with msvc7.1? If so then I guess I could install XP on my comp's spare partition to test it [edit] Only thing is that I haven't set up my development-environment in xp so It would take quite some time to get compiling things in it __________ |
gillius
Member #119
April 2000
|
I didn't mean you specifically, or even anyone, should do it. I just said simply that I was curious to see the results of MSVC 7.1's optimizations for P4 over GCC's optimizations. If MSVC can leverage SSE better than GCC, then perhaps we will see the C code far outperform the ASM code on MSVC while on GCC it really is still quite a toss-up (although enough of a toss-up that I'd say that had I only the C code now I wouldn't waste my time writing an ASM version). Gillius |
Oscar Giner
Member #2,207
April 2002
![]() |
Quote: ALLEGRO_USE_C=1 (GCC-based platforms only) So how can I do the c only version of allegro with MSVC? TARGET_ARCH_EXC is also for gcc only, but I just modified the makefile directly. I added /G7 /arch:SSE2 to CFLAGS. -- |
A J
Member #3,025
December 2002
![]() |
oscar, you might want to use -LTGC also. ___________________________ |
Raf256
Member #3,501
May 2003
|
I think it might be a good idea to change some ASM code to C now, because the "write in asm for speed" is beeing a bit of urbant legend today. I'm not expert in that subject, but I would realy suggest to try: - running a profiler and then using its output to optimize more (can this be run when bulding allegro? It would require to write a test function that will use all available functions few times to gave profiler occasion to analyze code, right?) - add a builder to make several versions of allegro, like i386,486,586 and so. Then correct version of library will be loaded (game should be distributed with alleg.i386.dll alleg.i586.dll and so on, and perhaps same for *.so for linux), I think it might gave a noticeble speed boost, with is the best thing in allegro right after its win32/linux/... portability |
Evert
Member #794
November 2000
![]() |
Quote: So how can I do the c only version of allegro with MSVC? Ideally you could, but I think the GDI driver (and maybe the DirectX one too) uses inline assembler anyway, thus making it impossible to build the C only version of the library with MSVC alone. Quote: because the "write in asm for speed" is beeing a bit of urbant legend today Not quite. The output of a compiler for a modern processor is going to beat hand-optimized assembly targeting a 386 though. Quote: But it can be done in GCC (no need for asm) using neat macros, with by the way will be eiter expand to SSE opcodes, or to i386 opcodes and so on - depending on selected architecture, with makes building process (see belowe) and maintainxce much easier \o/ I cant find now the link to page describing using it though (anyone? thoes where macros like VS8_ADD(..) or something?) Fine, but someone has to do the work (and make sure the C only version remains processor neutral in the process). Quote: running a profiler and then using its output to optimize more (can this be run when bulding allegro? It would require to write a test function that will use all available functions few times to gave profiler occasion to analyze code, right?) This is always done before applying a patch designed to increase code speed. Also check the timing functions in the test programme. Quote: add a builder to make several versions of allegro, like i386,486,586 and so. Then correct version of library will be loaded (game should be distributed with alleg.i386.dll alleg.i586.dll and so on, and perhaps same for *.so for linux),
This complicates the build process considerably and makes everything harder to maintain - especially since Intel processors are not the only target for Allegro (AMD 64 and Macintosh systems being the main alternatives). It also makes the build process pretty slow, since everything has to be compiled multiple times. I also wouldn't take kindly to a game I download coming with binaries for ten different types of processor. |
Bob
Free Market Evangelist
September 2000
![]() |
Quote: Writing optimized code should be a bit simplier. You wish. My guess is that programming for the Cell is going to be like programming for the Emotion Engine, except 10x more difficult. Things like "caches" and "virtual memory" don't make optimizing harder, they make optimizing a whole lot easier! You don't have to worry (much) about where your data is and in what order you access it; the CPU figures it out for you. -- |
HoHo
Member #4,534
April 2004
![]() |
Actually I meant really optimized code. Something like in Pixomatic if it's not so easy then all we can do is hope that they have a hell of a good compiler for it (or at least a huge and useful manual) __________ |
Bob
Free Market Evangelist
September 2000
![]() |
Quote: if it's not so easy then all we can do is hope that they have a hell of a good compiler for it Sure, in 10-15 years, just like every other architecture. There is nothing in CELL that makes compiler writers' life easier. Quote: (or at least a huge and useful manual) Likely. Not sure how useful that would be. Quote: Actually I meant really optimized code. Something like in Pixomatic I'll wait till I see it run on the Cell. -- |
HoHo
Member #4,534
April 2004
![]() |
What might help compiler is the fact that it has in-order core. If compiler knows exactly how underlying CPU works it makes optimizing a bit easier. With the Pixomatic I meant that the guys who programmed it used a lot of extreme optimization tricks to speed it up as much as possible but still quite often program worked a bit differently than they thought because cpu reordered some instructions and altered delays. There has been three long articles about how they developed Pixomatic engine in Dr. Dobbs journal. __________ |
Raf256
Member #3,501
May 2003
|
Imho my idea of building several versions is good - just make it an option and its win-win situation. By default liballegro will build fastly in 386 mode. By using ./configure --multi-arch build will take longer but all (or all selected) libs will be builded. |
HoHo
Member #4,534
April 2004
![]() |
You can always staticlink your executeable and provide ~20 different versions of it. If you want to be smart you create one program that checks cpu capabilities and then launches the fastest program file. __________ |
A J
Member #3,025
December 2002
![]() |
Quote: my idea of building several versions is good and who is going to write these versions ? ___________________________ |
HoHo
Member #4,534
April 2004
![]() |
He meant building a seperate version for every CPU architecture: __________ |
Kitty Cat
Member #2,815
October 2002
![]() |
-march would have a better effect than -mtune. -mtune still produces backwards-compatible code, so using, say, -mtune=pentium2 would still produce i386 compatible code, but tweak it to run better on P2's (which could be just as good for a P3 or Athlon-XP, given the must-be-i386-compatible restrictions). And why build arch-dependant versions if you're only likely to use one? -- |
Michael Jensen
Member #2,870
October 2002
![]() |
You could simply make allegro build several dynamically linked libraries, one for 386, 486, p5, etc... and just implient the entire api with function pointers, except allegro_init(), and that could set up the api when the program starts and calls it based on what kind of beast you actaully have... only the api would be optimized and not your user code tho, but the DLLs/SOs/whatevers would be huge. -- oh, and it would be a waste of time since allegro is already fast enough...
|
HoHo
Member #4,534
April 2004
![]() |
One reason there probably is no point to create several versions of the library is that it mostly only benefits newer machines that are mostly fast enough already. One thing we could think of is to replace some currently asm based functions with their c-coutnerparts that are faster(draw_rle_sprite seems to benefit most). __________ |
Michael Jensen
Member #2,870
October 2002
![]() |
Quote: (draw_rle_sprite seems to benefit most). but on slower computers, also? I thought the whole point of the current state of the draw_rle_sprite was for the slower computers, it was optimized for them, as for newer computers, they don't really need RLE sprites as far as speed goes.
|
HoHo
Member #4,534
April 2004
![]() |
From my previous test: Difference is way bigger than A J reported in another thread about blit performance I haven't got a slow computer ATM but during the weekend I could test it on a p200. There is a slight possibility that I've done something terribly weong with the tests so if anyone could confirm the results I would appreciate it. __________ |
Michael Jensen
Member #2,870
October 2002
![]() |
Quote: From my previous test: but that wasn't on a 486 or anything, and I'm guessing we'd really need a legit 386 to see some results; Anyway, on anything faster than that, that's not a big difference, 200k more on a function that nobody uses except people who write games for 486s... Now, maybe my argument isn't valid. But I feel that draw_rle_sprite is only practical for people forced to use it... -- How many RLE sprites do you really need to draw in a second, anyway?
|
HoHo
Member #4,534
April 2004
![]() |
You have a 486? If you do please test it. A normal pentium, pentium2, older amd or whatever else older than my pc should be good too. In my X-Com engine I needed to draw about 2500 sprites per frame (only for the map) sizes ranging from 20x30 to 32x64. On a p200 I had to use compiled sprites because they were olny ones fast enough so that the game could run ~30fps. Because of that I had to use all kind of hacks to get it working(bigger back buffer was the most painful one). Using RLE's it would have been much easier and it would have saved a lot of memory. Unfortunately I couldn't finish the project. I had map and soldier rendering and animation working. If I had added day and night cycle, pathfinding and AI I couldn't have spent the whole time rendering and then FPS would have dropped to half of the 30fps-limit. [edit] Too bad I can't bump this thread. I ran some more tests. This time there is no cpu-specific optimizations, its only c vs asm. I created a little table showing my results
As can be seen c is mostly quite a lot faster than asm. Please someone with older generation computers/compilers run the tests too and post your results here. I'm seriously thinking in modifying allegro test so that user only executes it and it runs most of the test(or only selected ones if I do it) so that it wouldn't be such a pain to test different settings. [edit2] Added 8bit result __________ |
A J
Member #3,025
December 2002
![]() |
bump ;> ___________________________ |
|
|