![]() |
|
CPU family compilation of allegro? |
Raf256
Member #3,501
May 2003
|
When compiling the allegro game library, can I somehow told it that I will run my allegro-using programs only on in example i486, or on pentium, not on i386? If I understand correctly, then some functions will work quite faster. But only some functions are that way (yes?). 1) compiled as -i386 using default allegro i386 (liked statickly) then have a small luncher checking cpu_info and running correct (fastest) version of the code). If might be noticable speed difference if one is doing lots of lower level stuff like pixel manimupaltions for particles system (as I do) |
HoHo
Member #4,534
April 2004
![]() |
In real world speed difference between different architecture targets is minimal. Allegro uses ASM internally and ATM its most time critical functions are in mmx and are way faster than any compiler could optimize it. In future allegro might have some functions using SSE/SSE2 too. Compiler will never get as good as hand optimizeds code __________ |
Kitty Cat
Member #2,815
October 2002
![]() |
If you're using GCC, there's a way to make it compile optimized for a specific family chipset and not retain backwards compatiility. I believe it's when using make, set TARGET_EXCLOPTS="xxx" where xxx is i386, i486, i586 (or pentium), i686 (or pentiumpro), pentium2, pentium3, pentium4, athlon, athlon-tbird, athlon-xp.. there may be more. These won't implicitly enable mmx or anything, although you can add -mmmx. So, for example, exclusively compiling for a Pentium w/ MMX or above (this excludes Pentium Pro, which doesn't have MMX), do: If you're using MSVC, then I don't know.. -- |
A J
Member #3,025
December 2002
![]() |
Quote: and are way faster than any compiler could optimize it false. Quote: Compiler will never get as good as hand optimizeds code false. if your using msvc, i have added (in the past few days) options to specify the chipset features. ___________________________ |
HoHo
Member #4,534
April 2004
![]() |
Quote: and are way faster than any compiler could optimize it Then why are pure-c versions slower than ASM versions of bitmap blitting, blending and the like? Quote: Compiler will never get as good as hand optimizeds code Human can always take compiler's ASM output and optimze it even more. Of cource if the code is very simple then compiler can create the best thing possible __________ |
A J
Member #3,025
December 2002
![]() |
Quote: Then why are pure-c versions slower than ASM versions of bitmap blitting, blending and the like?
they have not been written for a specific instruction set. Quote: Human can always take compiler's ASM output and optimze it even more.
humans can always take the toasters toast and burn it even more that the toaster ever could. Quote: Of cource if the code is very simple then compiler can create the best thing possible
so you go on to contradict yourself. ___________________________ |
HoHo
Member #4,534
April 2004
![]() |
It leaves me with no other choise than to compile c and asm versios of allegro targeted to different platforms. It might take a while though, I have busy times in the couple of next days __________ |
A J
Member #3,025
December 2002
![]() |
try writing some SSE/SSE2 instructions, they are the next thing allegro will need for improved speed. ___________________________ |
HoHo
Member #4,534
April 2004
![]() |
Can't compiler do it __________ |
Raf256
Member #3,501
May 2003
|
AFAIK good compilers, like gcc, will try to quite strongly optimize instructions and using additional registers and instructions.. of course when compiling with proper -march flag |
Tobi Vollebregt
Member #1,031
March 2001
|
Still they don't generate MMX or SSE instructions. Just CMOV if you specify -march=pentium or higher IIRC. ________________________________________ |
Kitty Cat
Member #2,815
October 2002
![]() |
Tobi, that's what -mmmx -msse -m3dnow etc are for. -- |
Evert
Member #794
November 2000
![]() |
You can tell the compiler to generate SSEx or MMX instructions - at least I know you can with the Intel Fortran compiler. As for wether or not a human can do a better job at optimizing assembler code than a computer can, yes they can indeed. A competent human generates better code than any compiler could - but it takes much more time and high level optimizations should be the first thing to do anyway. |
Tobi Vollebregt
Member #1,031
March 2001
|
Quote: -mmmx -msse -msse2 -msse3 -m3dnow See X86 Built-in Functions, for details of the functions enabled and disabled by these switches. To have SSE/SSE2 instructions generated automatically from floating-point code, see -mfpmath=sse. source: http://www.dis.com/gnu/gcc/i386-and-x86-64-Options.html See also: http://www.dis.com/gnu/gcc/X86-Built-in-Functions.html#X86%20Built-in%20Functions So IIRC only SSE can be generated automatically by using -mfpmath=sse, and not with -msse. I don't know about other compilers though. ________________________________________ |
A J
Member #3,025
December 2002
![]() |
for msvc7 /arch:SSE /arch:SSE2 -G7 ___________________________ |
HoHo
Member #4,534
April 2004
![]() |
Ok I thought I take the time and did some initial tests. Downloaded the latest weekly cvs snapshot and compiled four different versions: static, asm, p4 (./configure --enable-static=yes --enable-staticprog=yes --enable-opts=pentium4 --enable-exclopts=pentium4) I used static linking to be sure allegro doesn't use some dynamic library I have somewhere I ran test and benchmarked memory bitmaps. I had to cut out some of the results(DRAW_MODE_XOR, DRAW_MODE_COPY_PATTERN, DRAW_MODE_SOLID_PATTERN and DRAW_MODE_MASKED_PATTERN) to fit my post into 64kb limit(originally was ~125k). ML should add a warning when previewing your post when it's too big The results were very interesting. 1//static, asm, p4
2Allegro 4.1.19 (20050326), Unix profile results
3
4Memory bitmap size: 800x600
5Color depth: 16 bpp
6
7
8DRAW_MODE_SOLID results:
9
10 putpixel() - 3863980
11 hline() - 2187607
12 vline() - 1439061
13 line() - 303152
14 rectfill() - 219069
15 circle() - 198101
16 circlefill() - 177439
17 ellipse() - 123467
18 ellipsefill() - 120098
19 arc() - 174638
20 triangle() - 140105
21
22
23DRAW_MODE_TRANS results:
24
25 putpixel() - 2351360
26 hline() - 534863
27 vline() - 301393
28 line() - 160418
29 rectfill() - 15983
30 circle() - 92129
31 circlefill() - 8911
32 ellipse() - 70381
33 ellipsefill() - 9932
34 arc() - 153131
35 triangle() - 11200
36
37
38Other functions:
39
40 textout() - 205332
41 vram->vram blit() - N/A
42 aligned vram->vram blit() - N/A
43 blit() from memory - 352902
44 aligned blit() from memory - 540591
45 vram->vram masked_blit() - N/A
46 masked_blit() from memory - 188838
47 draw_sprite() - 309636
48 draw_rle_sprite() - 482440
49 draw_compiled_sprite() - 1465194
50 draw_trans_sprite() - 124051
51 draw_trans_rle_sprite() - 85396
52 draw_lit_sprite() - 112701
53 draw_lit_rle_sprite() - 188568
1//static, c, p4
2Allegro 4.1.19 (20050326), Unix profile results
3
4Memory bitmap size: 800x600
5Color depth: 16 bpp
6
7
8DRAW_MODE_SOLID results:
9
10 putpixel() - 4054055
11 hline() - 2254552
12 vline() - 1388449
13 line() - 451235
14 rectfill() - 207896
15 circle() - 284830
16 circlefill() - 188062
17 ellipse() - 155530
18 ellipsefill() - 125694
19 arc() - 253459
20 triangle() - 160083
21
22DRAW_MODE_TRANS results:
23
24 putpixel() - 2456494
25 hline() - 743660
26 vline() - 528555
27 line() - 260635
28 rectfill() - 25764
29 circle() - 156132
30 circlefill() - 27282
31 ellipse() - 106101
32 ellipsefill() - 27136
33 arc() - 187759
34 triangle() - 32878
35
36
37Other functions:
38
39 textout() - 193137
40 vram->vram blit() - N/A
41 aligned vram->vram blit() - N/A
42 blit() from memory - 447901
43 aligned blit() from memory - 426043
44 vram->vram masked_blit() - N/A
45 masked_blit() from memory - 264081
46 draw_sprite() - 421456
47 draw_rle_sprite() - 649046
48 draw_compiled_sprite() - 655934
49 draw_trans_sprite() - 159685
50 draw_trans_rle_sprite() - 194210
51 draw_lit_sprite() - 186170
52 draw_lit_rle_sprite() - 210518
1//static, asm
2Allegro 4.1.19 (20050326), Unix profile results
3
4Memory bitmap size: 800x600
5Color depth: 16 bpp
6
7
8DRAW_MODE_SOLID results:
9
10 putpixel() - 4022308
11 hline() - 2233415
12 vline() - 1496657
13 line() - 226901
14 rectfill() - 221278
15 circle() - 202662
16 circlefill() - 177981
17 ellipse() - 100883
18 ellipsefill() - 121407
19 arc() - 178891
20 triangle() - 143064
21
22
23DRAW_MODE_TRANS results:
24
25 putpixel() - 2410795
26 hline() - 458892
27 vline() - 419612
28 line() - 100949
29 rectfill() - 13648
30 circle() - 127500
31 circlefill() - 14698
32 ellipse() - 47123
33 ellipsefill() - 15687
34 arc() - 123835
35 triangle() - 18670
36
37
38Other functions:
39
40 textout() - 168178
41 vram->vram blit() - N/A
42 aligned vram->vram blit() - N/A
43 blit() from memory - 365947
44 aligned blit() from memory - 558622
45 vram->vram masked_blit() - N/A
46 masked_blit() from memory - 196613
47 draw_sprite() - 321303
48 draw_rle_sprite() - 474702
49 draw_compiled_sprite() - 1461730
50 draw_trans_sprite() - 124654
51 draw_trans_rle_sprite() - 170162
52 draw_lit_sprite() - 116946
53 draw_lit_rle_sprite() - 183653
1//static, c
2Allegro 4.1.19 (20050326), Unix profile results
3
4Memory bitmap size: 800x600
5Color depth: 16 bpp
6
7
8DRAW_MODE_SOLID results:
9
10 putpixel() - 4109923
11 hline() - 2165808
12 vline() - 1320638
13 line() - 420870
14 rectfill() - 180829
15 circle() - 273598
16 circlefill() - 182367
17 ellipse() - 158451
18 ellipsefill() - 122566
19 arc() - 233229
20 triangle() - 156225
21
22
23DRAW_MODE_TRANS results:
24
25 putpixel() - 2491428
26 hline() - 619655
27 vline() - 497386
28 line() - 246436
29 rectfill() - 19470
30 circle() - 147090
31 circlefill() - 21321
32 ellipse() - 99359
33 ellipsefill() - 22116
34 arc() - 161832
35 triangle() - 25889
36
37
38Other functions:
39
40 textout() - 165641
41 vram->vram blit() - N/A
42 aligned vram->vram blit() - N/A
43 blit() from memory - 363354
44 aligned blit() from memory - 384501
45 vram->vram masked_blit() - N/A
46 masked_blit() from memory - 208842
47 draw_sprite() - 357001
48 draw_rle_sprite() - 622629
49 draw_compiled_sprite() - 602795
50 draw_trans_sprite() - 120399
51 draw_trans_rle_sprite() - 157713
52 draw_lit_sprite() - 134902
53 draw_lit_rle_sprite() - 186415
__________ |
A J
Member #3,025
December 2002
![]() |
so P4 'C' code beats the P4 asm code... i like it!!! ___________________________ |
Kitty Cat
Member #2,815
October 2002
![]() |
Even the generic 'C' code gives the generic asm code a good fight. -- |
HoHo
Member #4,534
April 2004
![]() |
Quote: i like it!!! me too but what would happen if someone good with asm would take the output of compiler and tweak it a bit more? Actually the asm is not p4 but pentium mmx one I think. iirc gcc doesn't optimize asm files so they have no benefit from compiling to specific target architecture. One thing that's also interesting is that c with default platform target (pentium) is practically as fast as asm version. I guess the people that made it weren't exellent asm coders but only good ones(I wish I could be even a moderate one __________ |
Kitty Cat
Member #2,815
October 2002
![]() |
Quote: Actually the asm is not p4 but pentium mmx one I think. The hand-written ASM in Allegro, IIRC, is i386. I can't think of any place in the code that checks anway, and without checking, running non-i386 code on a 386 will crash it (and, afaik, Allegro is i386 compatible by default). -- |
Evert
Member #794
November 2000
![]() |
What's the difference between p4 asm and normal asm? Note that compilers have become better in optimzing code in recent years and that things that were a good idea to do in assembler five years ago (which is about where Allegro's assembler source originates) are not nescessarily a good idea on modern hardware anymore. |
Kitty Cat
Member #2,815
October 2002
![]() |
I wouldn't doubt the P4 has extra op-codes that can be taken advantage of. Plus, perhaps extra registers and the like. As well, it could also be the way the asm is written to take advantage of the CPU's specific op-timings. -- |
HoHo
Member #4,534
April 2004
![]() |
In p4(sse/sse2) asm one could use data prefech, extra sse registers and instructions together with lots of other things that doesn't exist on pentium1's might speed stuff up __________ |
X-G
Member #856
December 2000
![]() |
Interestingly, whereas CPUs were previously constructed in such a way that it would be easy for humans to write good ASM, these days CPUs are designed with compilers in mind instead... -- |
HoHo
Member #4,534
April 2004
![]() |
Cell should change it quite a bit. No cpu data prefech, no cache miss* or instruction reordering. Writing optimized code should be a bit simplier. Also it should be easier for compiler to generate good code. *)There is no cache in the extra eight SPU's but they do have an extreemly fast local memory programmer can use directly to store its data in it. __________ |
|
|