CPU family compilation of allegro?

Raf256

When compiling the allegro game library, can I somehow told it that I will run my allegro-using programs only on in example i486, or on pentium, not on i386?

If I understand correctly, then some functions will work quite faster.
Other functions are pre compiled in several ways (with, without MMX and so on) and then in run-time the correct version of it is chosen basing in cpu_info data, right?

But only some functions are that way (yes?).
If yes, then perhaps it would be cool to have script that will build several versions of allegro, like
alleg.i386.lib
alleg.i486.lib
alleg.i586.lib
and so on (abut 6 possibilities?) then one can either use it by something as `allegro-configure --libs --march ik7', or even better, for bigger/serious projects one can have an automatic builder that will make several versions of his game,

1) compiled as -i386 using default allegro i386 (liked statickly)
2) as -i586 using allegro i586 (staticly)
and so on

then have a small luncher checking cpu_info and running correct (fastest) version of the code). If might be noticable speed difference if one is doing lots of lower level stuff like pixel manimupaltions for particles system (as I do)

HoHo

In real world speed difference between different architecture targets is minimal.

Allegro uses ASM internally and ATM its most time critical functions are in mmx and are way faster than any compiler could optimize it. In future allegro might have some functions using SSE/SSE2 too. Compiler will never get as good as hand optimizeds code

Kitty Cat

If you're using GCC, there's a way to make it compile optimized for a specific family chipset and not retain backwards compatiility. I believe it's when using make, set TARGET_EXCLOPTS="xxx" where xxx is i386, i486, i586 (or pentium), i686 (or pentiumpro), pentium2, pentium3, pentium4, athlon, athlon-tbird, athlon-xp.. there may be more. These won't implicitly enable mmx or anything, although you can add -mmmx. So, for example, exclusively compiling for a Pentium w/ MMX or above (this excludes Pentium Pro, which doesn't have MMX), do:
make TARGET_EXCLOPTS="pentium -mmmx"

If you're using MSVC, then I don't know..

A J

Quote:

and are way faster than any compiler could optimize it

false.

Quote:

Compiler will never get as good as hand optimizeds code

false.

if your using msvc, i have added (in the past few days) options to specify the chipset features.
its currently in CVS, so expect it for the next WIP version.

HoHo

Quote:

and are way faster than any compiler could optimize it
false.

Then why are pure-c versions slower than ASM versions of bitmap blitting, blending and the like?

Quote:

Compiler will never get as good as hand optimizeds code
false.

Human can always take compiler's ASM output and optimze it even more. Of cource if the code is very simple then compiler can create the best thing possible

A J

Quote:

Then why are pure-c versions slower than ASM versions of bitmap blitting, blending and the like?

they have not been written for a specific instruction set.
the reason allegros ASM is quicker than allegros C code is that someone took the time to make it so. The ASM is not portable either. whereas the C code is portable, therefore you can not compare them.

Quote:

Human can always take compiler's ASM output and optimze it even more.

humans can always take the toasters toast and burn it even more that the toaster ever could. ::)

Quote:

Of cource if the code is very simple then compiler can create the best thing possible

so you go on to contradict yourself.
you seem to consider humans code optimizing abilites superiour to a computers, yet computers make far less mistakes than humans.

HoHo

It leaves me with no other choise than to compile c and asm versios of allegro targeted to different platforms. It might take a while though, I have busy times in the couple of next days

A J

try writing some SSE/SSE2 instructions, they are the next thing allegro will need for improved speed.

HoHo

Can't compiler do it Don't think any compiler(besides intel one) can use SSE/SSE2 well to speed up stuff.

Raf256

AFAIK good compilers, like gcc, will try to quite strongly optimize instructions and using additional registers and instructions.. of course when compiling with proper -march flag

Tobi Vollebregt

Still they don't generate MMX or SSE instructions. Just CMOV if you specify -march=pentium or higher IIRC.

Kitty Cat

Tobi, that's what -mmmx -msse -m3dnow etc are for. Just note that those switches don't produce backwards compatible code (using -msse will require SSE-capable CPU's, which even my Athlon-Tbird 1.1GHz isn't).

Evert

You can tell the compiler to generate SSEx or MMX instructions - at least I know you can with the Intel Fortran compiler.
Fortran compilers are typically better at optimizing than C compilers though...

As for wether or not a human can do a better job at optimizing assembler code than a computer can, yes they can indeed. A competent human generates better code than any compiler could - but it takes much more time and high level optimizations should be the first thing to do anyway.

Tobi Vollebregt

Quote:

-mmmx
-mno-mmx

-msse
-mno-sse

-msse2
-mno-sse2

-msse3
-mno-sse3

-m3dnow
-mno-3dnow
These switches enable or disable the use of built-in functions that allow direct access to the MMX, SSE, SSE2, SSE3 and 3Dnow extensions of the instruction set.

See X86 Built-in Functions, for details of the functions enabled and disabled by these switches.

To have SSE/SSE2 instructions generated automatically from floating-point code, see -mfpmath=sse.

source: http://www.dis.com/gnu/gcc/i386-and-x86-64-Options.html

So IIRC only SSE can be generated automatically by using -mfpmath=sse, and not with -msse.

I don't know about other compilers though.

A J

for msvc7

/arch:SSE /arch:SSE2 -G7

HoHo

Ok I thought I take the time and did some initial tests. Downloaded the latest weekly cvs snapshot and compiled four different versions:

static, asm, p4 (./configure --enable-static=yes --enable-staticprog=yes --enable-opts=pentium4 --enable-exclopts=pentium4)
static, c, p4 (./configure --enable-static=yes --enable-staticprog=yes --enable-asm=no --enable-opts=pentium4 --enable-exclopts=pentium4)
static, asm (./configure --enable-static=yes --enable-staticprog=yes)
static, c (./configure --enable-static=yes --enable-staticprog=yes --enable-asm=no )

I used static linking to be sure allegro doesn't use some dynamic library I have somewhere

I ran test and benchmarked memory bitmaps. I had to cut out some of the results(DRAW_MODE_XOR, DRAW_MODE_COPY_PATTERN, DRAW_MODE_SOLID_PATTERN and DRAW_MODE_MASKED_PATTERN) to fit my post into 64kb limit(originally was ~125k). ML should add a warning when previewing your post when it's too big

The results were very interesting.

#SelectExpand
  1//static, asm, p4
  2Allegro 4.1.19 (20050326), Unix profile results
  3
  4Memory bitmap size: 800x600
  5Color depth: 16 bpp
  6
  7
  8DRAW_MODE_SOLID results:
  9
 10    putpixel()      - 3863980
 11    hline()         - 2187607
 12    vline()         - 1439061
 13    line()          - 303152
 14    rectfill()      - 219069
 15    circle()        - 198101
 16    circlefill()    - 177439
 17    ellipse()       - 123467
 18    ellipsefill()   - 120098
 19    arc()           - 174638
 20    triangle()      - 140105
 21
 22
 23DRAW_MODE_TRANS results:
 24
 25    putpixel()      - 2351360
 26    hline()         - 534863
 27    vline()         - 301393
 28    line()          - 160418
 29    rectfill()      - 15983
 30    circle()        - 92129
 31    circlefill()    - 8911
 32    ellipse()       - 70381
 33    ellipsefill()   - 9932
 34    arc()           - 153131
 35    triangle()      - 11200
 36
 37
 38Other functions:
 39
 40    textout()                    - 205332
 41    vram->vram blit()            - N/A
 42    aligned vram->vram blit()    - N/A
 43    blit() from memory           - 352902
 44    aligned blit() from memory   - 540591
 45    vram->vram masked_blit()     - N/A
 46    masked_blit() from memory    - 188838
 47    draw_sprite()                - 309636
 48    draw_rle_sprite()            - 482440
 49    draw_compiled_sprite()       - 1465194
 50    draw_trans_sprite()          - 124051
 51    draw_trans_rle_sprite()      - 85396
 52    draw_lit_sprite()            - 112701
 53    draw_lit_rle_sprite()        - 188568

#SelectExpand
  1//static, c, p4
  2Allegro 4.1.19 (20050326), Unix profile results
  3
  4Memory bitmap size: 800x600
  5Color depth: 16 bpp
  6
  7
  8DRAW_MODE_SOLID results:
  9
 10    putpixel()      - 4054055
 11    hline()         - 2254552
 12    vline()         - 1388449
 13    line()          - 451235
 14    rectfill()      - 207896
 15    circle()        - 284830
 16    circlefill()    - 188062
 17    ellipse()       - 155530
 18    ellipsefill()   - 125694
 19    arc()           - 253459
 20    triangle()      - 160083
 21
 22DRAW_MODE_TRANS results:
 23
 24    putpixel()      - 2456494
 25    hline()         - 743660
 26    vline()         - 528555
 27    line()          - 260635
 28    rectfill()      - 25764
 29    circle()        - 156132
 30    circlefill()    - 27282
 31    ellipse()       - 106101
 32    ellipsefill()   - 27136
 33    arc()           - 187759
 34    triangle()      - 32878
 35
 36
 37Other functions:
 38
 39    textout()                    - 193137
 40    vram->vram blit()            - N/A
 41    aligned vram->vram blit()    - N/A
 42    blit() from memory           - 447901
 43    aligned blit() from memory   - 426043
 44    vram->vram masked_blit()     - N/A
 45    masked_blit() from memory    - 264081
 46    draw_sprite()                - 421456
 47    draw_rle_sprite()            - 649046
 48    draw_compiled_sprite()       - 655934
 49    draw_trans_sprite()          - 159685
 50    draw_trans_rle_sprite()      - 194210
 51    draw_lit_sprite()            - 186170
 52    draw_lit_rle_sprite()        - 210518

#SelectExpand
  1//static, asm
  2Allegro 4.1.19 (20050326), Unix profile results
  3
  4Memory bitmap size: 800x600
  5Color depth: 16 bpp
  6
  7
  8DRAW_MODE_SOLID results:
  9
 10    putpixel()      - 4022308
 11    hline()         - 2233415
 12    vline()         - 1496657
 13    line()          - 226901
 14    rectfill()      - 221278
 15    circle()        - 202662
 16    circlefill()    - 177981
 17    ellipse()       - 100883
 18    ellipsefill()   - 121407
 19    arc()           - 178891
 20    triangle()      - 143064
 21
 22
 23DRAW_MODE_TRANS results:
 24
 25    putpixel()      - 2410795
 26    hline()         - 458892
 27    vline()         - 419612
 28    line()          - 100949
 29    rectfill()      - 13648
 30    circle()        - 127500
 31    circlefill()    - 14698
 32    ellipse()       - 47123
 33    ellipsefill()   - 15687
 34    arc()           - 123835
 35    triangle()      - 18670
 36
 37
 38Other functions:
 39
 40    textout()                    - 168178
 41    vram->vram blit()            - N/A
 42    aligned vram->vram blit()    - N/A
 43    blit() from memory           - 365947
 44    aligned blit() from memory   - 558622
 45    vram->vram masked_blit()     - N/A
 46    masked_blit() from memory    - 196613
 47    draw_sprite()                - 321303
 48    draw_rle_sprite()            - 474702
 49    draw_compiled_sprite()       - 1461730
 50    draw_trans_sprite()          - 124654
 51    draw_trans_rle_sprite()      - 170162
 52    draw_lit_sprite()            - 116946
 53    draw_lit_rle_sprite()        - 183653

#SelectExpand
  1//static, c
  2Allegro 4.1.19 (20050326), Unix profile results
  3
  4Memory bitmap size: 800x600
  5Color depth: 16 bpp
  6
  7
  8DRAW_MODE_SOLID results:
  9
 10    putpixel()      - 4109923
 11    hline()         - 2165808
 12    vline()         - 1320638
 13    line()          - 420870
 14    rectfill()      - 180829
 15    circle()        - 273598
 16    circlefill()    - 182367
 17    ellipse()       - 158451
 18    ellipsefill()   - 122566
 19    arc()           - 233229
 20    triangle()      - 156225
 21
 22
 23DRAW_MODE_TRANS results:
 24
 25    putpixel()      - 2491428
 26    hline()         - 619655
 27    vline()         - 497386
 28    line()          - 246436
 29    rectfill()      - 19470
 30    circle()        - 147090
 31    circlefill()    - 21321
 32    ellipse()       - 99359
 33    ellipsefill()   - 22116
 34    arc()           - 161832
 35    triangle()      - 25889
 36
 37
 38Other functions:
 39
 40    textout()                    - 165641
 41    vram->vram blit()            - N/A
 42    aligned vram->vram blit()    - N/A
 43    blit() from memory           - 363354
 44    aligned blit() from memory   - 384501
 45    vram->vram masked_blit()     - N/A
 46    masked_blit() from memory    - 208842
 47    draw_sprite()                - 357001
 48    draw_rle_sprite()            - 622629
 49    draw_compiled_sprite()       - 602795
 50    draw_trans_sprite()          - 120399
 51    draw_trans_rle_sprite()      - 157713
 52    draw_lit_sprite()            - 134902
 53    draw_lit_rle_sprite()        - 186415

A J

so P4 'C' code beats the P4 asm code...

i like it!!!

Kitty Cat

Even the generic 'C' code gives the generic asm code a good fight.

HoHo

Quote:

i like it!!!

me too but what would happen if someone good with asm would take the output of compiler and tweak it a bit more? Actually the asm is not p4 but pentium mmx one I think. iirc gcc doesn't optimize asm files so they have no benefit from compiling to specific target architecture.

One thing that's also interesting is that c with default platform target (pentium) is practically as fast as asm version. I guess the people that made it weren't exellent asm coders but only good ones(I wish I could be even a moderate one :-[ )

Kitty Cat

Quote:

Actually the asm is not p4 but pentium mmx one I think.

The hand-written ASM in Allegro, IIRC, is i386. I can't think of any place in the code that checks anway, and without checking, running non-i386 code on a 386 will crash it (and, afaik, Allegro is i386 compatible by default).

Evert

What's the difference between p4 asm and normal asm?
If there is none, then the noice level is quite high on that data and it's difficult to say which code is better.

Note that compilers have become better in optimzing code in recent years and that things that were a good idea to do in assembler five years ago (which is about where Allegro's assembler source originates) are not nescessarily a good idea on modern hardware anymore.

Kitty Cat

I wouldn't doubt the P4 has extra op-codes that can be taken advantage of. Plus, perhaps extra registers and the like. As well, it could also be the way the asm is written to take advantage of the CPU's specific op-timings.

HoHo

In p4(sse/sse2) asm one could use data prefech, extra sse registers and instructions together with lots of other things that doesn't exist on pentium1's might speed stuff up

X-G

Interestingly, whereas CPUs were previously constructed in such a way that it would be easy for humans to write good ASM, these days CPUs are designed with compilers in mind instead...

HoHo

Cell should change it quite a bit. No cpu data prefech, no cache miss* or instruction reordering. Writing optimized code should be a bit simplier. Also it should be easier for compiler to generate good code.

*)There is no cache in the extra eight SPU's but they do have an extreemly fast local memory programmer can use directly to store its data in it.

A J

as the asm code is now possibly the cause of slow code, it might be time to review its usefulness.
how about an option to turn it off?
is there such an option ?
it would also make the dependancy on gcc for the msvc build less.

HoHo

./configure --enable-asm=no

[edit]

Someone with windows access should compile several versions of allegro and test(static linked). Then several people should run these on different machines to see how big of a speed gain/loss is it to use c-only vs asm.

gillius

I'd like to see HoHo's test run on MSVC 7.1 with SSE or SSE2 enabled. I have personally seen in generate SSE asm code for floating point operations (are they used in Allegro, even?). In my Direct3D game, IIRC, SSE2 increased the performance by 10-20%, but the game was heavy on vector math, which is what SSE was meant for.

HoHo

Quote:

I'd like to see HoHo's test run on MSVC 7.1 with SSE or SSE2 enabled.

I don't quite get what you mean by this. Do you want me to run allegro test compiled with msvc7.1? If so then I guess I could install XP on my comp's spare partition to test it

[edit]
i could also test-run it on my work pc(also a p4). it has both: linux and xp

Only thing is that I haven't set up my development-environment in xp so It would take quite some time to get compiling things in it

gillius

I didn't mean you specifically, or even anyone, should do it. I just said simply that I was curious to see the results of MSVC 7.1's optimizations for P4 over GCC's optimizations. If MSVC can leverage SSE better than GCC, then perhaps we will see the C code far outperform the ASM code on MSVC while on GCC it really is still quite a toss-up (although enough of a toss-up that I'd say that had I only the C code now I wouldn't waste my time writing an ASM version).

Oscar Giner

Quote:

ALLEGRO_USE_C=1 (GCC-based platforms only)

So how can I do the c only version of allegro with MSVC?

TARGET_ARCH_EXC is also for gcc only, but I just modified the makefile directly. I added /G7 /arch:SSE2 to CFLAGS.

A J

oscar, you might want to use -LTGC also.
whole program optimization, which does some inlining accross compilation units.

Raf256

I think it might be a good idea to change some ASM code to C now, because the "write in asm for speed" is beeing a bit of urbant legend today. I'm not expert in that subject, but I would realy suggest to try:
- using normal C not ASM
- then compiling it for good architecture (--march and so on)
- (EDITED) oh, one thing in fact should be done by hand, AFAIR - the MMX/SSE usage. But it can be done in GCC (no need for asm) using neat macros, with by the way will be eiter expand to SSE opcodes, or to i386 opcodes and so on - depending on selected architecture, with makes building process (see belowe) and maintainxce much easier \o/ I cant find now the link to page describing using it though (anyone? thoes where macros like VS8_ADD(..) or something?)

- running a profiler and then using its output to optimize more (can this be run when bulding allegro? It would require to write a test function that will use all available functions few times to gave profiler occasion to analyze code, right?)

- add a builder to make several versions of allegro, like i386,486,586 and so. Then correct version of library will be loaded (game should be distributed with alleg.i386.dll alleg.i586.dll and so on, and perhaps same for *.so for linux), or
autor of allegro using game can also make multi-builder (to make like game.i386.exe .i486.exe and so on) and then he could staticly link same version of allegro (link i486 version of allegro into his code while building it for i486 and so on).

I think it might gave a noticeble speed boost, with is the best thing in allegro right after its win32/linux/... portability

Evert

Quote:

So how can I do the c only version of allegro with MSVC?

Ideally you could, but I think the GDI driver (and maybe the DirectX one too) uses inline assembler anyway, thus making it impossible to build the C only version of the library with MSVC alone.

Quote:

because the "write in asm for speed" is beeing a bit of urbant legend today

Not quite. The output of a compiler for a modern processor is going to beat hand-optimized assembly targeting a 386 though.

Quote:

But it can be done in GCC (no need for asm) using neat macros, with by the way will be eiter expand to SSE opcodes, or to i386 opcodes and so on - depending on selected architecture, with makes building process (see belowe) and maintainxce much easier \o/ I cant find now the link to page describing using it though (anyone? thoes where macros like VS8_ADD(..) or something?)

Fine, but someone has to do the work (and make sure the C only version remains processor neutral in the process).

Quote:

running a profiler and then using its output to optimize more (can this be run when bulding allegro? It would require to write a test function that will use all available functions few times to gave profiler occasion to analyze code, right?)

This is always done before applying a patch designed to increase code speed. Also check the timing functions in the test programme.

Quote:

add a builder to make several versions of allegro, like i386,486,586 and so. Then correct version of library will be loaded (game should be distributed with alleg.i386.dll alleg.i586.dll and so on, and perhaps same for *.so for linux), or
autor of allegro using game can also make multi-builder (to make like game.i386.exe .i486.exe and so on) and then he could staticly link same version of allegro (link i486 version of allegro into his code while building it for i486 and so on).

This complicates the build process considerably and makes everything harder to maintain - especially since Intel processors are not the only target for Allegro (AMD 64 and Macintosh systems being the main alternatives). It also makes the build process pretty slow, since everything has to be compiled multiple times. I also wouldn't take kindly to a game I download coming with binaries for ten different types of processor.
Distributiong shared object files in Linux is a big no-no, by the way.

Bob

Quote:

Writing optimized code should be a bit simplier.

You wish. My guess is that programming for the Cell is going to be like programming for the Emotion Engine, except 10x more difficult.

Things like "caches" and "virtual memory" don't make optimizing harder, they make optimizing a whole lot easier! You don't have to worry (much) about where your data is and in what order you access it; the CPU figures it out for you.

HoHo

Actually I meant really optimized code. Something like in Pixomatic

if it's not so easy then all we can do is hope that they have a hell of a good compiler for it (or at least a huge and useful manual)

Bob

Quote:

if it's not so easy then all we can do is hope that they have a hell of a good compiler for it

Sure, in 10-15 years, just like every other architecture. There is nothing in CELL that makes compiler writers' life easier.

Quote:

(or at least a huge and useful manual)

Likely. Not sure how useful that would be.

Quote:

Actually I meant really optimized code. Something like in Pixomatic

I'll wait till I see it run on the Cell.

HoHo

What might help compiler is the fact that it has in-order core. If compiler knows exactly how underlying CPU works it makes optimizing a bit easier.

With the Pixomatic I meant that the guys who programmed it used a lot of extreme optimization tricks to speed it up as much as possible but still quite often program worked a bit differently than they thought because cpu reordered some instructions and altered delays.

There has been three long articles about how they developed Pixomatic engine in Dr. Dobbs journal.

Raf256

Imho my idea of building several versions is good - just make it an option and its win-win situation.

By default liballegro will build fastly in 386 mode.

By using ./configure --multi-arch build will take longer but all (or all selected) libs will be builded.
Then user might make several versions of his game - if it is on a CD then I do not care, I can throw even 10 .exe files and 10 linux ELF executables.
Dont want this future - just do not use it, easy

HoHo

You can always staticlink your executeable and provide ~20 different versions of it. If you want to be smart you create one program that checks cpu capabilities and then launches the fastest program file.

A J

Quote:

my idea of building several versions is good

and who is going to write these versions ?

HoHo

He meant building a seperate version for every CPU architecture:
i386, i486, pentium3, athlon-xp, pentium4 and all the others you can define with -mtune

Kitty Cat

-march would have a better effect than -mtune. -mtune still produces backwards-compatible code, so using, say, -mtune=pentium2 would still produce i386 compatible code, but tweak it to run better on P2's (which could be just as good for a P3 or Athlon-XP, given the must-be-i386-compatible restrictions). And why build arch-dependant versions if you're only likely to use one?

Michael Jensen

You could simply make allegro build several dynamically linked libraries, one for 386, 486, p5, etc... and just implient the entire api with function pointers, except allegro_init(), and that could set up the api when the program starts and calls it based on what kind of beast you actaully have... only the api would be optimized and not your user code tho, but the DLLs/SOs/whatevers would be huge. -- oh, and it would be a waste of time since allegro is already fast enough...

HoHo

One reason there probably is no point to create several versions of the library is that it mostly only benefits newer machines that are mostly fast enough already.

One thing we could think of is to replace some currently asm based functions with their c-coutnerparts that are faster(draw_rle_sprite seems to benefit most).

Michael Jensen

Quote:

(draw_rle_sprite seems to benefit most).

but on slower computers, also?

I thought the whole point of the current state of the draw_rle_sprite was for the slower computers, it was optimized for them, as for newer computers, they don't really need RLE sprites as far as speed goes.

HoHo

From my previous test:
statically linked allegro with asm routines, no special compilation flags on my home PC:
draw_rle_sprite() - 474702
Same but c-only:
draw_rle_sprite() - 622629

Difference is way bigger than A J reported in another thread about blit performance

I haven't got a slow computer ATM but during the weekend I could test it on a p200.

There is a slight possibility that I've done something terribly weong with the tests so if anyone could confirm the results I would appreciate it.

Michael Jensen

Quote:

From my previous test:
statically linked allegro with asm routines, no special compilation flags on my home PC:
draw_rle_sprite() - 474702
Same but c-only:
draw_rle_sprite() - 622629

but that wasn't on a 486 or anything, and I'm guessing we'd really need a legit 386 to see some results; Anyway, on anything faster than that, that's not a big difference, 200k more on a function that nobody uses except people who write games for 486s... Now, maybe my argument isn't valid. But I feel that draw_rle_sprite is only practical for people forced to use it... -- How many RLE sprites do you really need to draw in a second, anyway?

HoHo

You have a 486? If you do please test it. A normal pentium, pentium2, older amd or whatever else older than my pc should be good too.

In my X-Com engine I needed to draw about 2500 sprites per frame (only for the map) sizes ranging from 20x30 to 32x64. On a p200 I had to use compiled sprites because they were olny ones fast enough so that the game could run ~30fps. Because of that I had to use all kind of hacks to get it working(bigger back buffer was the most painful one). Using RLE's it would have been much easier and it would have saved a lot of memory.

Unfortunately I couldn't finish the project. I had map and soldier rendering and animation working. If I had added day and night cycle, pathfinding and AI I couldn't have spent the whole time rendering and then FPS would have dropped to half of the 30fps-limit.

[edit]

Too bad I can't bump this thread.

I ran some more tests. This time there is no cpu-specific optimizations, its only c vs asm. I created a little table showing my results

1P4 3.0@3.82 1M cache                                                                                                                              
2512M ram single channel                                                                                                                           
32.6.11-gentoo-r4                                                                                                                                  
4gcc version 3.4.3 20050110                                                                                                                        
5                                                                                                                                                  
6Resolution                800x600 800x600Speed difference800x600 800x600Speed difference   800x600    800x600Speed difference   800x600    800x600Speed difference
7Driver                          X       X           in %       X       X           in %          X          X           in %          X          X           in %
8Bitdepth                       32      32       >0 means      24      24       >0 means         16         16       >0 means          8          8       >0 means
9c/asm                           C     Asm    C is better       C     Asm    C is better          C        Asm    C is better          C        Asm    C is better
10textout()                  171212  213753          -19.9  155685  263032         -40.81     191343     208104          -8.05     194637     196086          -0.74
11blit() from memory         453115  300027          51.02  185657  264728         -29.87     466279     388827          19.92     363734     539238         -32.55
12masked_blit() from memory  229041  163688          39.93  129498   86325          50.01     232047     231639           0.18     198995     269500         -26.16
13draw_sprite()              416321  297268          40.05  245417  249798          -1.75     408864     344216          18.78     320361     495931          -35.4
14draw_rle_sprite()          717764  421222           70.4  640240  387080           65.4     708827     517834          36.88     674117     508662          32.53
15draw_compiled_sprite()     717493 1661973         -56.83  642139 1245370         -48.44     710376    1710608         -58.47     663957    2035349         -67.38
16draw_trans_sprite()        176353   75453         133.73  124666   44480         180.27     164053      46482         252.94     353072     464093         -23.92
17draw_trans_rle_sprite()    192268   88702         116.76  171235   47175         262.98     199796     209313          -4.55     432254     344263          25.56
18draw_lit_sprite()          182740  153953           18.7  133052  140253          -5.13     178109     148202          20.18     325448     268232          21.33
19draw_lit_rle_sprite()      204653   91639         123.33  191327   43380         341.05     222652      46045         383.55     622470     548296          13.53

As can be seen c is mostly quite a lot faster than asm.

Please someone with older generation computers/compilers run the tests too and post your results here.

I'm seriously thinking in modifying allegro test so that user only executes it and it runs most of the test(or only selected ones if I do it) so that it wouldn't be such a pain to test different settings.

[edit2]

Added 8bit result

A J

bump ;>

Bob

If we eliminate draw_compiled_sprite (which doesn't count when doing a C/asm comparison), and the 24-bit formats that the compiler has trouble with, it looks like C-only is a net win for 32 and 15/16-bpp.

That's awesome!

Elias

Yes. It means, we can safely remove all the asm in the 4.3.x tree. So, with all the djgpp stuff gone and the asm stuff gone, the 4.3.0 Allegro code will not be recognizable at all.. from big and hackish it should go to really nice

HoHo

Please don't remove compiled sprites, if one (me) needs absolute speed of blitting then they are the way to go. Anything else you can do whatever you want.

Someone with a bit better ASM knowledge should find out what makes some functions slower in c than in asm, especially in 24/8 bit depth.

Also there should be a bit more tests before we could make any assumptions about speed. E.g I don't know how fast msvc is. If it's considerably worse in optimizing then some might not like if asm routines are gone.

Michael Jensen

Ummm, on the test results, draw_compiled_sprite ASM still kicked it's c counter parts ass, and both draw_rle_sprites anyway, with the argument you made, I think you'd still be stuck using draw_compiled_sprite... I could probably put together a 486, I've got enough junk that needs to be tossed out, but I doubt I will... All I'm saying is, is that at one point, the asm had to be faster than the c, why else would they have used it? it could be maybe that this is only on older processors, or perhaps, on older compilers that didn't optimize as well... anyway...

Elias

Yes. I'm not too convinced yet as well. It must be only about a year since we added the asm color converters to unix, and from what I remember, they brought some speed gain. So not too sure.. but if further tests all show that the C versions are faster - we'll remove the asm.

About compiled sprites (as well as RLE sprites) - I don't have the impression they are that fast, especially when not used on memory bitmaps. And remember, for 4.3.x, we also want to complete the AGL vtable, so for system which have OpenGL available, we will have a lot more HW accelerated stuff than currently as well.

Matt Smith

My attempts at installing 98SE on my new machine have failed, so I can't test my old 8 bit asm blitting on a (relatively) modern Sempron. It was clearly 3 times faster than the equivalent code in C on my Celeron. The 16 bit test was 1.5 times faster, and the 32 bit was the same.

This was code written 386 style, with no regard for pipeline stalls or cache, and compiled with gcc 2.95. I think gcc traditionally has a problem with 8 & 16 bit, but maybe it is getting better.

I am no longer in favour of supporting DOS with 4.3. 4.0 is the finished article for DOS. Most of the new features don't make any sense on a single-tasking non-accelerated machine anyway.

I'm basing the console ports on 4.0.3, rather than WIP, originally to fix the feature set (which needs further trimming for GBA and PS1) but now that 4.2 doesn't compile with gcc 2.95 anymore I'm glad I did, because gcc 2.95 is all you can get for most of these consoles.

Evert

Quote:

now that 4.2 doesn't compile with gcc 2.95 anymore

Actually, I think that's a bug. Not sure how easy it will be to fix though.

About C vs asm, I'd be interested to know why the asm code performs so much worse before tearing it out (although my own interest in it is minimal, since it doesn't compile on an AMD64 in native mode anyway).

Richard Phipps

Now that PC's are getting more powerful I think we could sacrifice some speed if C code would be easier to maintain and develop.

I could understand why for 386 & 486's it was very useful to get the best speed possible, but I don't think this is so important now.

ReyBrujo

That specific ASM code has became obsolete, that is what it means. The compiler is able to find much better optimizations than the ones used in those files.

Elias

Yeah. If someone contributes optimized asm code, we certainly will keep it, or even put it in again during 4.3.x after the old was thrown out

Richard Phipps

So is this a conscious decision to leave 4.0.3 more suitable for DOS and older PC's, and move the 4.2/4.3 branches into being optimised or aimed more for modern PC's?

Elias

4.2.0 will fully support DOS, and also have all the asm code (but maybe the C version should get default?). The next major release will be 4.4.0, which won't support DOS, and have the new gfx API and events API and so on - but with a compatibility layer so (most) 4.2.0 programs will also work with 4.4.0. 4.3.x are the new WIP versions after 4.2.0 has been released.

About the decision.. whatever will be done will be done

There never was a decision to remove DOS support, but 4.3.x in CVS doesn't compile with djgpp, and nobody is interested in fixing it - so that's how the support was dropped.

In the asm case it's different, we maybe need to decide to remove it. But I guess, we can just wait until someone implements the new gfx api vtable entries.. which will be written in C. If there's nobody to write asm versions (and the current asm routines apparently are not worth porting) - asm will have been dropped.

Richard Phipps

Thanks for that Elias. Oh, does the newer versions of GCC produce significantly better optimisations than 2.95? I wonder how close the compiled C code is to any new handrwritten optimised ASM?

Michael Jensen

Quote:

Now that PC's are getting more powerful I think we could sacrifice some speed if C code would be easier to maintain and develop.

I completly agree with this... especially after reading Evert's AMD64 statement.

Quote:

does the newer versions of GCC produce significantly better optimisations than 2.95?

I have no actual education on this topic, but I would think that it's a combination of things: New hardware architectures, so old optimized asm doesn't work as well... Newer compilers that optimize better than the old ones... and optimize better for newer architecture than old ones did for the old architectures.... (generally anyway, I'm thinking about more than just GCC 2.95, etc)

edit:
also, while compiling/installing allegro 420, I noticed a lot of switches with 586 in them, possibly i586? not sure, anyway, could it be that code produced with this specific version of the library wouldn't even run on 3/486s?

Peter Wang

Quote:

HoHo, please do! The more and better tools we have, the more often we will bother to do profiling.

Kitty Cat

Quote:

not sure, anyway, could it be that code produced with this specific version of the library wouldn't even run on 3/486s?

If it's compiled with -mcpu or -mtune, it will still run on 3/486s. But if it's -march, it'll require the specified CPU or better. By default, I believe Allegro compiles with -mcpu=pentium (so it's Pentium-optimized, but will work on older CPUs). Personally myself, I compile Allegro with -march=athlon-tbird -mmmx -m3dnow (plus some GCC 3.4-or-better switches, like -fweb -frename-registers -funswitch-loops), so it'll require an Athlon-TBird or better, with MMX and 3DNow. Since I'm using the Linux shared lib though, it won't be distributed (or if it ever is, I can build a more compatible version).

Michael Jensen

Kitty: If you don't specify march options, will you get any kind of performance increase for having those technologies? (3dnow, mmx, etc) as opposed to someone who doesn't? or does the extra hardware sit dormant?

also: do all modern* computers have mmx? (modern=high end p2+)

HoHo

I think in intel line, Pentium1 was the last one that all models didn't have mmx.

Evert

Quote:

Now that PC's are getting more powerful I think we could sacrifice some speed if C code would be easier to maintain and develop.

Well, it certainly helps portability... might be something to consider I suppose.

Quote:

There never was a decision to remove DOS support, but 4.3.x in CVS doesn't compile with djgpp, and nobody is interested in fixing it - so that's how the support was dropped.

Actually, some people seem to be interested. If they should choose to maintain the DJGPP port and keep it up to date with the rest, then I see no reason to say `no, we want to drop this'.
Which reminds me, is the old Mac version still distributed along with the rest? Does that even compile? I suspect no one has actually touched it in five years or so...

HoHo

[quote Peter Wang]HoHo, please do! The more and better tools we have, the more often we will bother to do profiling.<quote>is it OK if I don't follow the usual manner of Allegro examples and tests that there are so far? I would like to use some c++ features (STL and perhaps a bit OO) and split it between several files. Also, may I use some better Gui library than Allegro builtin one?

Evert

Quote:

is it OK if I don't follow the usual manner of Allegro examples and tests that there are so far?

Depends a bit on to what extend they're different. If we want to fit it into Allegro itself, then I tend to say no.

Quote:

I would like to use some c++ features (STL and perhaps a bit OO) and split it between several files.

May or may not be ok, as far as the C++ part goes. I would personally say no here, but others are free to disagree of course.

Quote:

Also, may I use some better Gui library than Allegro builtin one?

What on Earth for? A test programme doesn't really need a neat looking GUI, the standard Allegro one should be fine.
Certainly if we want it to be part of Allegro it should have no external dependencies other than Allegro itself.

That said, do you want it to be a part of Allegro? You could release it as a seperate package. That said, I probably would not use it myself very often if I have to recompile it (and addon libraries) everytime I want to check something that was just changed within Allegro.

HoHo

Ok, I think I can do it with c and allegro gui but I would really want to split it to several files. I don't like to scroll around in a 4500+ line file to find something.

I don't care if it's included or not it in allegro, it's up to the ones who decide stuff like this. But before deciding anything I first try actually doing it and then we'll see how good it really comes out.

Thomas Fjellstrom

If it were apart of the library, as in part of the API, I'd say no C++, but as its a tool, who cares?

Evert

Quote:

I would really want to split it to several files

My no was to the C++ part, not the multiple files part.

HoHo

[EDIT3]
The results might be wrong. I'm guiding my brother via msn how to do them. He redid c 24bit version and got some results 2-4x faster and others 2*4x slower. I'm investigating this
[edit4]
That's weird. My brother ran 24bit tests once again and now c version is ~10% slower than before whereas asm is ~10% faster than before. I think on a computer so slow any kind of activity drags results down. I guess I can't get any good measurements until weekend when I go home.

I know, it's just that so far all allegro examples and demo has been in single files (there were talks that new demos will be seperate).

[edit]

I've got some new benchmarking results, this time from a pentium mmx 200@225, Gentoo, gcc 3.3.4

p200@225 Mmx                   X with KDE
i386-pc-linux-gnu-3.3.4           800x600            Difference,                      Difference,                      Difference,                      Difference,
                                   32 bit               >0 means      24bit              >0 means      16bit              >0 means       8bit              >0 means
                                        C        ASM c is faster          C        Asm c is faste          C        Asm c is faste          C        Asm c is faste
textout()                           12407      12381        0.21       6350       6604      -3.85      11386      13061     -12.82      12167      14424     -15.65
blit() from memory                   7777       7499        3.71       3035       6557     -53.71       8801      17335     -49.23       7696      17453      -55.9
masked_blit() from memory           10692       9175       16.53       4199       4727     -11.17       9693      11086     -12.57       9763      16606     -41.21
draw_sprite()                       23794      19076       24.73      10843      13262     -18.24      20111      19675       2.22      23727      25422      -6.67
draw_rle_sprite()                   24229      24977       -2.99      15125      19547     -22.62      26824      29598      -9.37      24799      18234         36
draw_compiled_sprite()              24020      33138      -27.52      15758      29037     -45.73      27069      49774     -45.62      25029      15631      60.12
draw_trans_sprite()                  7928       3791      109.13       5473       3889      40.73       6713       4992      34.48      14308      26334     -45.67
draw_trans_rle_sprite()              7019       4131       69.91       6677       4379      52.48       7167       5598      28.03      15138      13690      10.58
draw_lit_sprite()                    7575       3826       97.99       5543       3786      46.41       7686       4929      55.93      22386      24294      -7.85
draw_lit_rle_sprite()                7392       4279       72.75       6107       4349      40.42       7356       5692      29.23      21654      29968     -27.74

More bitdepths to follow.

I think it's pretty safe to make c the default compile target if other platforms/compilers show similar results

[edit2]

Added other depths too.

As it seems other bitdepths don't benefit as much from c-only solution :-/ I guess lack of branch prediction and inorder execution give the biggest hit. Anyone got a p2 to run tests

trans/lit ones seem to get speed boost almost everywhere whereas 24bit is much slower in general blitting/sprites.

I wonder, how hard it would be to find out CPU type runtime and modify vtables accordingly to get fastest function possible. Probably not so easy and it gets questionable once OpenGL is integrated,

Michael Jensen

Since i've got a closet full of p1s and olders, I might as well, but what I dont have is a closet full of hard drives or ram, so if someone wants to point me towards a fairly light version of linux that would run blazingly fast on such computer and still work with allegro I would be more than willing to test...

hardware I probably have
586: ~100mhz, 8-16 mb of ram, 300-500mb disk space
486: <100mhz, 4-16 mb of ram, 300-500mb disk space

should be able to find a cd-rom, and either a VLB card for the 486,
or hopefully a PCI one for the 586...

let me know...

edit:
I've also got a 250mb parallel port zip drive if a suitable version of linux supports booting off of that...

edit2:
I've got a system setup
586: 100mhz, 32mb of ram, 325mb hdd, complete with working cd-rom, and 3.5" drive. This one is a "customized" (as in just now -- to get it working) HP vectra, with on-board video, probably on the PCI bus.

I also have a 486 that I can probably do next, but only one at a time.

I had to unhook another computer to get a spot to hook this one up so I can't have it running for more than a couple of days... At this point I need an OS, any suggestions?

soniCron

I'm curious. I'm seeing a lot of discussion about whether or not to support the i386 ASM code. But, the compiler optimized code is faster, and people are having to make an effort trying to get even a 486 computer to test stuff on. Since DOS support is being dropped (incidentally or on purpose) for 4.3x, why even bother with the i386 ASM? I believe it's long lived its days, and worth retiring, especially if it's easier to maintain the code. And I have a hunch that it would be much easier, since, (and this is only from what I can understand) nobody seems to be maintaining it anyway. It IS still 386 code, after all, and nobody seems to have been interested in updating it.

Now, don't get me wrong, I'm not knocking the quality of the code. It's far more organized and well written than anything I could dream of writing. But it just seems, to me, that keeping this legacy code in it, that isn't being updated when the rest of the system is, is pointless. If someone is going to be using Allegro in DOS on a 386DX, (any takers? ANYone?) the 4.03 release should be more than sufficient. I just read the changelog for everything between 4.03 to 4.1.18, and there were no references to any fixes to any 386 or DOS code. In fact, quite the contrary.

There were a significant portion of fixes BECAUSE of the 386 code!! This seems a little counterintuitive, to me. Someone mentioned not worrying about the DOS support in 4.3x because it doesn't make much sense to support an unaccellerated platform. I'd say the 386 follows the same...

Now, I've been using Allegro for years and years and years. I plan on getting involved in the development of the core system once I have some free time to do so. So please don't think I'm knocking the development team, or anything of the like. I have very much respect for those of you that work on Allegro. But I cannot, for the life of me, understand why there's even a discussion of whether or not to update the code from 386 support! Chuck it.

My two cents.

Elias

Quote:

But I cannot, for the life of me, understand why there's even a discussion of whether or not to update the code from 386 support!

The tests are not to decide whether to drop the code or not in case C is always faster, but they are to decide if it is (and maybe why). It would be very unwise throwing out that asm code based on 1 or 2 tests, just to discover later that there was some reason the asm code was slower on that specific systems only but not on others..

soniCron

Elias said:

Unless compiling for a pre-MMX compatible system, using an optimizing compiler will be faster than pure 386 assembly. And more portable. Period.

I'm just saying, if nobody is maintaining the 386 ASM, then chuck it, because it's getting in the way of the compiler's optimizations.

Matthew Leverton

Has anyone posted a test that can be downloaded? If two static binaries are provided for Windows users, testing could be done quite quickly.

Elias

Quote:

I'm just saying, if nobody is maintaining the 386 ASM, then chuck it, because it's getting in the way of the compiler's optimizations.

Yes, as I said in this (or another) thread already, I'm happy myself about the prospect of removing the current asm code.

Quote:

Has anyone posted a test that can be downloaded? If two static binaries are provided for Windows users, testing could be done quite quickly.

Yes, that would be nice. Allegro's test program is quite awkward to use, without any automated benchmark feature.

HoHo

Quote:

Has anyone posted a test that can be downloaded?

I'm working on it but it doesn't move as fast as I would like because I'm having so much other things to do.

I think for a start allegro test with some (big) changes would be enough. E.g executing with "--profile test.cfg" it would run through some tests(selected in input file) using different bitdepths and drivers.

Evert

Quote:

I just read the changelog for everything between 4.03 to 4.1.18, and there were NO references to any fixes to ANY 386 or DOS code.

Changes from 4.1.14 to 4.1.15 said:

Henrik Stokseth enabled the asm color converters for the X11 port.

Although I think this is the most recent major addition that has been seen recently. As Elias pointed out, this was included because it lead to a speed increase. If benchmarks now indicate the C version is faster, well... that's peculiar and something to be investigated.
As for fixes specific to the DOS/DJGPP code, there have been several fixes, but they are not clearly marked in the docs as being DOS specific.

Also, if there are no fixes, I think that is more a sign that there are no known bugs in those portions of code. The DOS and 386 assembler code are probably the oldest parts of the library. As long as there are no changes to the internals, they do not need to be touched or modified and no new bugs appear.

Anyway, it will stay for 4.2, we can discuss removing or disabling the code after that. But, as long as there are people who want to maintain the DOS or 386 code they are welcome to do so (and yes, there are people actually interested in that).

Quote:

I plan on getting involved in the development of the core system once I have some free time to do so.

None of us have much time anyway, so you should fit right in now.

Quote:

But I cannot, for the life of me, understand why there's even a discussion of whether or not to update the code from 386 support!

Because it works and has been well tested over the years and you should take care before you change something that works. It may be that the asm code is slower than the code the compiler produces on the latest generation of processors. As I said, a year ago this was not the case.
The C only version already exists and can be made the default in due time if that makes sense. But it will have to be looked into and considered first.

The first step to making the C version more useful, in my opinion, would be to get it to compile in MSVC without the aid of MinGW or DJGPP.

Michael Jensen

Quote:

Has anyone posted a test that can be downloaded? If two static binaries are provided for Windows users, testing could be done quite quickly.

that would kick arse, donkey even... especially if it was provided also for DOS users and dumped the results to a text file...

I left make running over night and allegro default build should be done on my 586... How do you build the "C only" version with djgpp?

Quote:

and people are having to make an effort trying to get even a 486 computer to test stuff

People out there still use 486s, it's just us developers that don't... besides I've got a 486 that I can test this on next; but it is a lot of effort, I'm going to reuse the stuff the 586 compiled (actually, I'm just going to use the same hdd 8-) )

soniCron

Evert said:

...if there are no fixes, I think that is more a sign that there are no known bugs in those portions of code.

It's also a sign of neglected code. Assembly isn't something you write once and leave. As processors change, you update and add to it, if possible. The point of writing in assembler is that you can utilize the maximum potential of the processor. If you're not updating the code, you're not utilizing the maximum potential of the processor. I'm not saying it doesn't work, I'm saying it doesn't work well enough.

Evert said:

Because it works and has been well tested over the years and you should take care before you change something that works.

Are you still using an Apple II? Progress is worth change.

I'm suprised that after almost 10 years of hardware accelerated graphics, Allegro still won't use the blending features found on any cheap 3D card. If it's a problem of compatability, use different rasterizers: software and hardware. This isn't a new concept, the two biggest graphics API's come to mind: OpenGL and DirectX.

And please don't tell me that "it's coming", because I know. 4.3x is supposed to have it. I'm just without understanding why these optimizations haven't been implimented yet, is all. If someone can tell me, please. I'm not upset that Allegro doesn't use the latest and greatest, but I'm upset that nobody seems to be saying why.

Michael Jensen said:

People out there still use 486s, it's just us developers that don't...

If the developers don't still use 486s, then they're probably not developing for them. Which means that the 386/486 users are getting little to no benefit from using the latest version of Allegro. The non-architectural changes since the last major 386 and DOS code updates are negligible, and make almost no, if any, difference to a developer on a 386.

The 386 and DOS users have their Allegro 4.03. I'm not saying that it should be cut out before 4.3x, but if the 386 assembly is not updated for 4.3x, you're still backboning on an inferior technology. Like running Windows on top of DOS. It's a fruitless method.

HoHo

Quote:

I'm suprised that after almost 10 years of hardware accelerated graphics, Allegro still won't use the blending features found on any cheap 3D card.

Have you actually any ideas how diffenrent are opengl/dx and allegro? In it's current state there is almost no way allegro could use any accelerated functions of modern gfx cards. In (near?) future when OpenGL gets added to Allegro these things will change but not until then.

Bob

Quote:

I'm just without understanding why these optimizations haven't been implimented yet, is all

Those optimizations are incompatible with Allegro's API.

Evert

Quote:

I'm just without understanding why these optimizations haven't been implimented yet

Because the big rewrite `Allegro5' was supposed to have it all.

Not entirely accurate, but it is a big part of the reason no one bothered to try and add some things: because Allegro5 was supposed to have it anyway and no one wanted to do double work.

soniCron

HoHo said:

In (near?) future when OpenGL gets added to Allegro these things will change but not until then.

That is exactly my point. "(near?) future" Nobody is even sure when it will be implimented. It's been 2 years since the last "stable" version of Allegro came out. It'll probably another 2 years before OpenGL support comes to Allegro.

Bob said:

Those optimizations are incompatible with Allegro's API.

Not entirely. Phasing in certain features is a good way to move an API from one direction to another. The primitive drawing could be done in 3D. Any bitmap objects can be textures. (There are system bitmaps, video bitmaps, etc, why not texture bitmaps). Writing to these texture bitmaps would be slow, but if you're developing with that in mind, you could simply not do per-pixel drawing, or updating of bitmaps, very often.

Blitting from a bitmap to screen would then be done via texturing a GL quad primitive. There's nothing the GUI draws that can't be done in 3D. Most of the blending functions wouldn't be able to translate without pixel shaders, and I admit that would be a downside. But how often do people use the "saturate" blender? Stretching, rotating, lighting, shading, and flipping sprites would be insanely easy in GL. With exception to writing directly to the screen, every drawing function can be done with GL, albiet slower if you try to draw directly to textures every frame. But using GL as a rendering device could be explicitly defined in the source, so older programs will use the software rendering, and programs wanting to use GL can. Obviously, those using GL wouldn't work in some environments, like DOS (unless you also included Mesa), but drawing 200 half-transparent sprites is gonna suck any way you do it on a 386. Oh, and the 3D drawing functions could actually draw in...3D.

I guess my whole point boils down to this: Why is fblend still an external library?! If it is a highly recommended library for any Allegro developer, then why does it even exist? Why is it not in the Allegro library? That makes no sense. It's this kind of backassward development that lead to my developing my own software and hardware rasterizer for Allegro. This kind of stuff should be in Allegro to begin with.

IMHO, Allegro is easily the best game development library to work with. Even though SDL is designed for any form of cross-platform multimedia development, there's a reason why it's the most widely used free library for making games. It's not at all easier to use. But nobody developing SDL is saying "and you should take care before you change something that works.". But having to use an external library for decently fast blending means it doesn't work!

I'm done.

Thomas Fjellstrom

Quote:

It'll probably another 2 years before OpenGL support comes to Allegro.

AllegroGl. Already has nearly complete Allegro Mode support.

All of you points boil down to noone with time to do the nesesary work. You want it done that bad, do it yourself.

soniCron

Thomas Fjellstrom said:

AllegroGl. Already has nearly complete Allegro Mode support.

This just proves my point as with fblend. Why is it not part of the Allegro core?

Thomas Fjellstrom said:

All of you points boil down to noone with time to do the nesesary work.

I disagree. I have a feeling it has to do with lack of project organization. There's no public roadmap that I can find. I hear about this magical roadmap all the time, and people keep referencing it, but I have yet to see it. If it does exist, I would appreciate someone telling me where I can find it. I, for some time, wanted to get involved with Allegro, but I have no idea how. There's no "head honcho" to get in touch with. The website offers little information on getting involved, as well. The SourceForge page has 5 admins listed, with no indication of who to contact to help support the project.

Thomas Fjellstrom said:

You want it done that bad, do it yourself.

Like I said in my last post, I am writing a software and hardware rasterizer. Please thoughly read my posts before responding to them.

Thomas Fjellstrom

I don't have the time or skills to integrate either FBlend or AllegroGL into allegro propper.

The few devs we do have barely have time to get bug fixes and releases done.

Quote:

I am writing a software and hardware rasterizer.

What does that have to do with anything?

Want to get involved? Get on the AD list. Thats mentioned on the home page.

edit: Just what are you doing for allegro? That justifies calling the dev team lazy and unorganized?

soniCron

Thomas Fjellstrom said:

What does that have to do with anything?

Do you know what a rasterizer does?

Thomas Fjellstrom

yup. read my edit.

soniCron

edit: misread your edit
edit: Thomas Fjellstromm, thank you. I will join the dev list.

Thomas Fjellstrom said:

Just what are you doing for allegro?

Give me a couple more weeks, and you will see. You may not hate me anymore when I'm done.

Edit: Well, now, Thomas, after reading some of the mail list, I retract what I said about them being unorganized (though I never said they were lazy). When most of the development progress takes place in a mail list, it makes it seem, to the public, that there's not much going on. Obviously I was wrong, though it would be nice to have more frequent updates to the homepage. But that's just me being picky!

Kitty Cat

Quote:

Why is fblend still an external library?

I asked this same question, and I was told because they didn't want two seperate blending APIs in Allegro, even if they don't directly conflict with each other. I said, and still say, it's better to have two blending APIs with one fast one, than just one slow one.

The only thing I can think of currently for not integrating FBlend's blending functions into Allegro is that FBlend's blending functions don't work on video or system bitmaps at all (not even slowly/falling back on Allegro's current ones). As for AllegroGL, last I heard Bob said it wasn't quite done enough to integrate into Allegro.. particularly, it's missing vtable functions (which cause crashes when used).

IMHO, once those are done, I'm all for them being integrated.. but that isn't going to happen in time for 4.2. 4.3 will likely see something, and (good new for that) 4.3 has been worked on some already, so it shouldn't be terribly long after 4.2 before that sees daylight.

soniCron

Kitty Cat said:

I asked this same question, and I was told because they didn't want two seperate blending APIs in Allegro, even if they don't directly conflict with each other.

Well, I'm currently merging the Allegro blending code with fblend so they both work with the regular Allegro blend functions. I'll probably have something to submit by tomorrow.

Bob

Quote:

Most of the blending functions wouldn't be able to translate without pixel shaders

Sorry, no. You can't read the destination buffer from a pixel shader.

Quote:

Not entirely. Phasing in certain features is a good way to move an API from one direction to another.

You can phase things in, but you can't phase things out. Allegro 4.x still needs to be source-compatible with 1.0.

Quote:

Any bitmap objects can be textures. (There are system bitmaps, video bitmaps, etc, why not texture bitmaps). Writing to these texture bitmaps would be slow, but if you're developing with that in mind, you could simply not do per-pixel drawing, or updating of bitmaps, very often.

AllegroGL already implements all of this.

Quote:

Stretching, rotating, lighting, shading, and flipping sprites would be insanely easy in GL.

You'd think so... Allegro has lots of neat little corner cases that makes this slightly more complicated. GPUs have a bunch more corner cases that makes this a lot more difficult.

For example, masked_blit in AllegroGL is actually implemented by a dynamic bitmap vtable change, based on the current state and which GPU you are running on.

Quote:

Obviously, those using GL wouldn't work in some environments, like DOS (unless you also included Mesa

AllegroGL already works under DOS, with MESA.

Quote:

Why is fblend still an external library?!

There are many reasons. One of them is that there is no one to maintain the code. I simply don't have time for this anymore. And the FBlend code isn't exactly trivial either.

Quote:

I have a feeling it has to do with lack of project organization.

No amount of organization makes up for the lack of manpower.

Quote:

Is that true? I've never tested it (and recommended against it), but the code is written in a way that makes this possible. Of course, there could be bugs...

Besides, the more common case of sourcing from memory bitmaps and writting to the screen does work.

Kitty Cat

Quote:

Is that true?

I thought I heard that FBlend only works on memory bitmaps. I could be wrong or thinking of something else, though.

soniCron

Bob said:

You can phase things in, but you can't phase things out. Allegro 4.x still needs to be source-compatible with 1.0.

You can phase things out, it's just a matter of wanting to. A lot of work seems to be going on with Allegro 4.x, but not so much with 5.x. And I'm assuming that 5 will break API compatability?

Bob said:

AllegroGL already implements all of this.

But Allegro does not. Yet again, making library authors impliment features that should be in Allegro.

Bob said:

GPUs have a bunch more corner cases that makes this a lot more difficult.

For one, nothing has to be implimented by the GPU. For another, sorry, but moving, rotating, scaling, etc a quad is negligible in GL.

Bob said:

AllegroGL already works under DOS, with MESA.

Forwarded to: Department of Redundancy Department

Bob said:

I simply don't have time for this anymore. And the FBlend code isn't exactly trivial either.

The last stable fblend release was in 2001. The last stable release of Allegro was in 2003. I think, sometime after a stable release of fblend and before the 4.03 release of Allegro, something could have been integrated. In those 2 years.

Bob said:

No amount of organization makes up for the lack of manpower.

This has already been addressed. Please read my previous posts.

Thomas Fjellstrom

If its that big a problem for you, DO IT. I don't have the time, the AUTHOR of the addons you're complaining about doesn't have the time, and the DEVS don't have the time...

Bob

Quote:

But Allegro does not. Yet again, making library authors impliment features that should be in Allegro.

That's not how things happened, however. AllegroGL started out as a quick hack to prove that it was possible (and because the original AGL devs also wanted it).

Slowly, we (== mostly me) started filling in the vtables. We don't implement it all yet. Some of it is diffucult, other features were impossible to implement as Allegro just didn't let us. Thankfully, this has been mostly fixed in 4.2.

Quote:

For one, nothing has to be implimented by the GPU.

Huh?

Quote:

For another, sorry, but moving, rotating, scaling, etc a quad is negligible in GL.

- You wouldn't use a quad if you blit from a memory bitmap. That's just wasteful.
- Allegro has a funny way of rotating things.

Since you have absolutely no idea what you're talking about, lemme paste the rotation code from AllegroGL:

1static void allegro_gl_screen_pivot_scaled_sprite_flip(struct BITMAP *bmp,
  struct BITMAP *sprite, fixed x, fixed y, fixed cx, fixed cy, fixed angle,
  fixed scale, int v_flip)
4{
double dscale = fixtof(scale);
GLint matrix_mode;
AGL_LOG(2, "glvtable.c:allegro_gl_screen_pivot_scaled_sprite_flip\n");

9#define BIN_2_DEG(x) ((x) * 180.0 / 128)

glGetIntegerv(GL_MATRIX_MODE, &matrix_mode);
glMatrixMode(GL_MODELVIEW);
glPushMatrix();
glTranslated(fixtof(x), fixtof(y), 0.);
glRotated(BIN_2_DEG(fixtof(angle)), 0., 0., -1.);
glScaled(dscale, dscale, dscale);
glTranslated(-fixtof(x+cx), -fixtof(y+cy), 0.);

do_masked_blit_screen(sprite, bmp, 0, 0, fixtoi(x), fixtoi(y),
                      sprite->w, sprite->h, v_flip ? V_FLIP : FALSE, FALSE);
glPopMatrix();
glMatrixMode(matrix_mode);
23 
24#undef BIN_2_DEG
25 
return;
27}

do_masked_blit_screen() and associated helper functions form another ~500 lines of code.

Quote:

It's not an issue of integration. Sure, it's really easy to copy the code from FBlend, and paste it in Allegro. That's not the problem. Really, it's not.

Quote:

This has already been addressed. Please read my previous posts.

No it doesn't. But feel free to try anyway. I'll vote against the patch when you get around to submitting it.

soniCron

Thomas Fjellstrom said:

If its that big a problem for you, DO IT. I don't have the time, the AUTHOR of the addons you're complaining about doesn't have the time, and the DEVS don't have the time...

For one, there's no need to be hostile. Another thing:

Daniel Kinney said:

Well, I'm currently merging the Allegro blending code with fblend so they both work with the regular Allegro blend functions. I'll probably have something to submit by tomorrow.

Please thoughly read the thread before getting upset. In addition, my point has nothing to do with the library not being integrated, per se, but about the development environment that prevents things like this from ever being implimented into the code. The lack of willingness to upgrade methods is counterproductive, and is the reason Allegro is well behind SDL in popularity and common usage, despite having a better written API.

So, please do not tell me to "do it yourself" anymore. I am aware of that option. It's not about anybody doing it themselves, it's about a community effort to better the library, and hostile and corrosive comments and attitudes do not promote such productive activities.

Just because you don't like what I have to say does not mean that it should not be done. All talk and no "do" is just as bad as no talk, no "do".

Thomas Fjellstrom

Yeah, I read what you had to say, but really, I just sorta glossed over it all.

Quote:

The lack of willingness to upgrade methods is counterproductive, and is the reason Allegro is well behind SDL in popularity and common usage, despite having a better written API.

Please read what others have to say ::) Its not lack of willingness, we all want a better allegro, but allegro was committed to uber backwards and forwards compatability long ago, we can't just drop that.

Which is where the All mighty Allegro 5 came in. It was supposed to be the be all end all replacement, but it died. Now 4.3+ is where we are putting our effort, it will (all beit slowly) change and reorganise allegro into something more modern.

Elias

Quote:

There's enough of a roadmap to keep us busy for a year or two. With Allegro 5 (which is dead btw), the approach was to build up an advanced roadmap, with having nobody to develop it. Right now, we still have no developers with lots of time, but we're now trying to gradually implement pieces of the old Allegro 5 design, and see where it gets us. So far, we're in the progress of releasing a new major version, and already have a completely new events API and beginnings of the new gfx API in CVS.

Quote:

If it does exist, I would appreciate someone telling me where I can find it. I, for some time, wanted to get involved with Allegro, but I have no idea how. There's no "head honcho" to get in touch with. The website offers little information on getting involved, as well. The SourceForge page has 5 admins listed, with no indication of who to contact to help support the project.

Well, 3 of them are Evert, Peter and me.. and we're all here in the forums anyway About some sort of guide to new developers, I agree that there's not much. I started collecting some information while I discovered it on the wiki: http://awiki.strangesoft.net/bin/view/Main/AllegroDev It's nothing really useful there yet.. but the one-paragraph roadmap it has is about as much as there is.

soniCron

Bob said:

- You wouldn't use a quad if you blit from a memory bitmap. That's just wasteful.

You wouldn't use a memory bitmap if you wanted to move,stretch,rotate,etc sprites.

Bob said:

Since you have absolutely no idea what you're talking about, lemme paste the rotation code from AllegroGL: (cut)

Exactly, what about this is hard? It's quite sensical, though not necessarily typical.

Bob said:

It's not an issue of integration. Sure, it's really easy to copy the code from FBlend, and paste it in Allegro. That's not the problem. Really, it's not.

Then what is it?

Bob said:

I'll vote against the patch when you get around to submitting it.

And yet another example of corrosive behavior.

Kitty Cat

Quote:

You wouldn't use a memory bitmap if you wanted to move,stretch,rotate,etc sprites.

Regardless if you would or wouldn't, Allegro allows it, so AllegroGL would have to work with it.

Bob

Quote:

You wouldn't use a memory bitmap if you wanted to move,stretch,rotate,etc sprites.

Yes you would. You can't rotate video bitmaps in Allegro, so existing Allegro apps are already using rotated memory bitmaps.

Quote:

Then what is it?

Maintenance, as mentioned in previous posts.

soniCron

I concede. Have a nice day.

Evert

Quote:

Why is fblend still an external library?! If it is a highly recommended library for any Allegro developer, then why does it even exist? Why is it not in the Allegro library?

The answer to all those questions, and all questions similar to `why hasn't ... been added to Allegro?', is that we have too many people saying what should be done and what should and should not be in the library and too few people actually contributing code.
If you want to lend a hand, then by all means do so. Allegro sorely lacks developers.

Quote:

lead to my developing my own software and hardware rasterizer for Allegro. This kind of stuff should be in Allegro to begin with.

What are you complaining about? There was something missing in Allegro, you added it. Great. Now, if you want it to be in Allegro, submit a patch.

Quote:

But nobody developing SDL is saying "and you should take care before you change something that works.".

You know, there are people who think that Allegro is evil for breaking compatibility. I don't know what the hell they're talking about, as Shawn said in his big writeup, backward compatibility is one of Allegro's strengths and priorities. Part of that compatibility is a certain level of conservatism when it comes to accepting code changes.
Does it make sense to replace the asm code with C code if that works better and is easier to maintain? Hell, yes - in fact my personal interest in the asm code is zero since I can't use it anyway. But you do not change that overnight. There are subtleties in Allegro's source you and I are not aware of. There's a reason there is no real support for building a pure C version in Windows. The reason is that the resulting DLL will be incompatible with the DLL containing the asm code.
Does it make sense to have faster and better blenders in Allegro? Hell, yes. But someone has to write the code for that, test it and make sure it integrates well.
Shawn said, and I agree with him, that in order to properly adminster a project is that you have to be a complete and utter nazi when it comes to changing things. If you start doing a lot of big changes without carefully considering what you do and how it affects other things you will end up breaking the code, which is doing everyone a disservice.
Change and improve things, but always be careful and sceptical about what you're trying to do.

Quote:

There's no "head honcho" to get in touch with.

I guess Eric Botcazou is still our Dictator because no one has succeeded him in that role.

Quote:

The website offers little information on getting involved, as well. The SourceForge page has 5 admins listed, with no indication of who to contact to help support the project.

Either of the admins will do, but the best thing would be to just post on the mailinglist, or these forums.
At the very least, Elias, Peter, Kitty Cat and I read the forums.

Quote:

When most of the development progress takes place in a mail list, it makes it seem, to the public, that there's not much going on.

I used to post weekly summaries of the main topics on AD a couple of years ago, while Peter Hull did the same for Allegro5. I'm not sure if it makes sense to try to restart something like that, but I do think that the mailinglist is a better place for discussion than a public forum.
Also, for future reference, you should check if anything has been going on and if so how much before giving the impression that the developers don't actually do anything. As it is, we're all extremely busy.

Are you new to the message boards? If so, that explains a bit as well. Otherwise, you could have seen threads and discussion on Allegro development going on on and off.

Quote:

though it would be nice to have more frequent updates to the homepage.

I think Gregorz is on vacation. I'm sure he'll update the site when he has time (at least I hope so because I don't know who else would do it...)

Quote:

I think, sometime after a stable release of fblend and before the 4.03 release of Allegro, something could have been integrated. In those 2 years.

What you forget or don't realize is that people were working on the big rewrite-from-scratch Allegro5, which died because no one had the time or energy to complete it. In the meanwhile, work on the Allegro 4.1 branch was going on but a lot of new features were planned for Allegro5 anyway so not implemented. Why do double work, afterall?
When Allegro5 finally died completely it left Allegro itself in a rather dangerous situation. About a year ago, it was practically dead and development had slowly grinded to a halt. All things considered, I think we've managed to get things going again fairly well. Better than I would have hoped for, actually.
But I will tell you that we will not try to make the same mistake again. The code will be evolved rather than recreated from scratch from 4.2 to something like the proposed 5.0, and work has already started on this. To focus fully on that, 4.2 has to mature though. That, for me, is the top priority at the moment.

Quote:

it's about a community effort to better the library, and hostile and corrosive comments and attitudes do not promote such productive activities.

Please realize that there have been too many people talking tall about what has to be done, the net result of which has so far been zero. No one is saying that you couldn't do a good job, but you'll have to show that before we believe it.
As for corrosive comments and attitudes, you may want to screen your own posts for some of that as well.

And to paraphrase Shawn again, to administer a project like Allegro is to be a nazi. If your suggestions and code are criticised, scrutinised and shot to pieces then that only means that we want to have it a bit better or integrate better with what we already have. Above all, it shows that we care about it and have taken the time to think about it.
It would be a bad sign if whatever you come up with were to be accepted without criticism.

And now I'm going to sleep.

Michael Jensen

Quote:

Bob said:
You can phase things in, but you can't phase things out. Allegro 4.x still needs to be source-compatible with 1.0.

Quote:

You can phase things out, it's just a matter of wanting to. A lot of work seems to be going on with Allegro 4.x, but not so much with 5.x. And I'm assuming that 5 will break API compatability?

I could be off my rocker but, dx for example, is backwards compatable, but they change things up internally all the time, while just leaving the existing api intact -- for example dx5 code works on dx9, but in dx9 it's all wrapped thru to newer functions (3d functions if I'm not mistaken)...

Quote:

not to be rude, but your posts seem rude, I have no problems looking at your ideas based on their own merits but you need a freaking personality adjustment...

Quote:

I'll vote against the patch when you get around to submitting it.

Quote:

And yet another example of corrosive behavior.

Bob didn't say he was voting against your patch because he disliked you, he dislikes the idea, and if anybody knows anything about what should be patched into allegro it's freaking Bob (IMHO)...

Quote:

If you want to lend a hand, then by all means do so. Allegro sorely lacks developers.

I had no idea allegro was in such bad shape...

Quote:

And to paraphrase Shawn again, to administer a project like Allegro is to be a nazi.

Damn allegro nazis -- you'd think we'd have more people willing to develop

Umm the 4.2.0 I have is failing on the DJGPP "C only" make, so I'm going to have to download the cvs and start over...

edit: I was fascinated to see the difference in speed between compiled sprite, and rle sprite earlier today tho (I'm not talking about looking at #s either, you could actually see it on the screen in modex!)

btw the only video modes I seem to get are 320x200x8, 640x480x8, 640x480x24 (ModeX, VESA1, and VESA1 respectivley)

Elias

Quote:

Umm the 4.2.0 I have is failing on the DJGPP "C only" make, so I'm going to have to download the cvs and start over...

I'm not sure, but DJGPP might have its own problems with using C only. I may have missed something.. but was there already success of C only with MSVC at all?

Milan Mimica

Quote:

Well, I'm currently merging the Allegro blending code with fblend so they both work with the regular Allegro blend functions. I'll probably have something to submit by tomorrow.

Even though patches like this one are going to be refused, we could create a repository of unofficial patches against (current?) allegro.

Elias

Quote:

Even though patches like this one are going to be refused, we could create a repository of unofficial patches against (current?) allegro.

Yes, I think it is a great idea. I'm actually planning to do something like this for some time, but never got around to it. Just a "community pack", which would be a distribution of Allegro together with the most useful addons. And ideally, wrapped up nicely, so it can be compiled/installed easily, maybe with a small installer script/program. In any case, it would not be completely trivial to maintain something like this. And if it is not maintained, then it would probably not be very useful.

I'm also not sure about how it is best done technically. One way would be to e.g. start a SF project, create a CVS repository, and put allegro and all the addons into that repository, together with a set of the necessary patches, then synching with allegro/addons as necessary. Or maybe have the installer download the relevant versions of all packages, and just keep the patches in CVS.

In any case, it should get easier than now (need to download every single addon from allegro.cc and build/install/integrate manually). And it shouldn't have to be maintained by the core Allegro developers, as long as there are so few.

Evert

Quote:

I was fascinated to see the difference in speed between compiled sprite, and rle sprite earlier today tho (I'm not talking about looking at #s either, you could actually see it on the screen in modex!)

Yup. Actually, compiled sprites are the fastest way to draw sprites on modern hardware too, if I remember the benchmarks I saw correctly. This is interesting because three years ago, the fastest way to draw sprites was by using hardware accelerated blits.

Quote:

btw the only video modes I seem to get are 320x200x8, 640x480x8, 640x480x24 (ModeX, VESA1, and VESA1 respectivley)

320x200x8 is a standard VGA mode. There should be all sorts of other wierd ModeX resolutions you can set though, 320x240 being one of the more useful ones. Those VESA modes sound about right, although I think my old 486 was capable of more than that.

Quote:

Umm the 4.2.0 I have is failing on the DJGPP "C only" make, so I'm going to have to download the cvs and start over...

Hmm... as far as I know it should work. Do provide some details if you still have problems with CVS.

Quote:

Even though patches like this one are going to be refused, we could create a repository of unofficial patches against (current?) allegro.

If by that you mean what Elias understood you to mean, basically a repository of addons, then that is a good idea but something that has been discussed so much that I'm surprised no one did it before now.
If you mean a repository of actual patches and diffs against Allegro itself, I recommend against that.

Milan Mimica

Quote:

If you mean a repository of actual patches and diffs against Allegro itself, I recommend against that.

Yeah, that's exactly what I meant. Unofficial patches like linux kernel has. I didn't really think allegro.cc would host them because this would be a collection of patches you (allegro developers) have rejected. You would dislike them. Of course, this would work only if someone is willing to maintain these patches. Anyway, I would like to see a flblend integration patch.
And maybe some day we could merge this patches into a new AllegroProUltraSepcialEdition

Evert

Quote:

Anyway, I would like to see a flblend integration patch.

If someone integrates Allegro and fblend in such a way that fblend can work through Allegro's blending API, I'd prefer that they do it properly and use the blender vtable entry. In that case, it could be made a proper addon rather than an ugly unsupported hack.
That would be far more useful.

If patches are rejected, they're rejected for a reason, not because someone doesn't liek someone else (I hope). It either doesn't fit in or doesn't solve the problem it set out to solve, or has side issues. Creating a fork of Allegro based on rejected patches is, in my opinion, a very bad idea.

Elias

Well, since the collection of such patches currently contains a total of about 0 patches, you should be more worried about someone writing such patches first

Evert

I have one that adds triple buffering to the plain X11 driver that was rejected.
It uses the scrolling API but uses a seperate thread to scroll the screen to not block the user programme. This is probably a bad idea in general, but it helped me test some things related to triple buffering at some time.

Elias

Well, I wouldn't call that triple buffering, since one crucial element for triple buffering is vsync. You can't do triple buffering without vsync.. else you can as well use two pages. So yeah, the only use is to test something.. but what? I really doubt anyone but you would have a use for that patch

Evert

Quote:

Well, I wouldn't call that triple buffering, since one crucial element for triple buffering is vsync. You can't do triple buffering without vsync.. else you can as well use two pages.

Not quite, at least not as I would define tripplebuffering. The crucial part is that you don't have to wait for vsync. There's one page displayed, one pages scheduled to be displayed at the next update of the graphics card/driver (`vsync') and one you use for drawing the next frame.

Quote:

So yeah, the only use is to test something.. but what?

Code doing triple buffering. The main reason I don't ever support triple buffering in any of my programmes is that I can't really test easily if it works myself.
It has no real benefit otherwise, ie, animations are not smoother than when using double buffering (for obvious reasons) as would be the case on other platforms.

Quote:

I really doubt anyone but you would have a use for that patch

I don't know... sometimes people find that they want or need something when they realise that it's there.

Elias

Quote:

Yes, was just going to edit when you replied - correctly would have been, there's simply no point in using triple buffering, since you could just as well use page flipping without vsync, in the case of X11.

Quote:

Hm, a much better patch for that would be one to add a flip() method though, so you don't need any extra code besides a flag to use triple buffering.. well, in 4.3.x, we'll have that anyway

Thread #474843. Printed from Allegro.cc

1	P4 3.0@3.82 1M cache
2	512M ram single channel
3	2.6.11-gentoo-r4
4	gcc version 3.4.3 20050110
5
6	Resolution 800x600 800x600Speed difference800x600 800x600Speed difference 800x600 800x600Speed difference 800x600 800x600Speed difference
7	Driver X X in % X X in % X X in % X X in %
8	Bitdepth 32 32 >0 means 24 24 >0 means 16 16 >0 means 8 8 >0 means
9	c/asm C Asm C is better C Asm C is better C Asm C is better C Asm C is better
10	textout() 171212 213753 -19.9 155685 263032 -40.81 191343 208104 -8.05 194637 196086 -0.74
11	blit() from memory 453115 300027 51.02 185657 264728 -29.87 466279 388827 19.92 363734 539238 -32.55
12	masked_blit() from memory 229041 163688 39.93 129498 86325 50.01 232047 231639 0.18 198995 269500 -26.16
13	draw_sprite() 416321 297268 40.05 245417 249798 -1.75 408864 344216 18.78 320361 495931 -35.4
14	draw_rle_sprite() 717764 421222 70.4 640240 387080 65.4 708827 517834 36.88 674117 508662 32.53
15	draw_compiled_sprite() 717493 1661973 -56.83 642139 1245370 -48.44 710376 1710608 -58.47 663957 2035349 -67.38
16	draw_trans_sprite() 176353 75453 133.73 124666 44480 180.27 164053 46482 252.94 353072 464093 -23.92
17	draw_trans_rle_sprite() 192268 88702 116.76 171235 47175 262.98 199796 209313 -4.55 432254 344263 25.56
18	draw_lit_sprite() 182740 153953 18.7 133052 140253 -5.13 178109 148202 20.18 325448 268232 21.33
19	draw_lit_rle_sprite() 204653 91639 123.33 191327 43380 341.05 222652 46045 383.55 622470 548296 13.53

1	static void allegro_gl_screen_pivot_scaled_sprite_flip(struct BITMAP *bmp,
2	struct BITMAP *sprite, fixed x, fixed y, fixed cx, fixed cy, fixed angle,
3	fixed scale, int v_flip)
4	{
5	double dscale = fixtof(scale);
6	GLint matrix_mode;
7	AGL_LOG(2, "glvtable.c:allegro_gl_screen_pivot_scaled_sprite_flip\n");
8
9	#define BIN_2_DEG(x) ((x) * 180.0 / 128)
10
11	glGetIntegerv(GL_MATRIX_MODE, &matrix_mode);
12	glMatrixMode(GL_MODELVIEW);
13	glPushMatrix();
14	glTranslated(fixtof(x), fixtof(y), 0.);
15	glRotated(BIN_2_DEG(fixtof(angle)), 0., 0., -1.);
16	glScaled(dscale, dscale, dscale);
17	glTranslated(-fixtof(x+cx), -fixtof(y+cy), 0.);
18
19	do_masked_blit_screen(sprite, bmp, 0, 0, fixtoi(x), fixtoi(y),
20	sprite->w, sprite->h, v_flip ? V_FLIP : FALSE, FALSE);
21	glPopMatrix();
22	glMatrixMode(matrix_mode);
23
24	#undef BIN_2_DEG
25
26	return;
27	}