RLE sprites are good for isometric tile based games (such as ufo2000). We don't need rotation or any other advanced features, and very fast blitting and reduced memory usage have their advantages here.
But unfortunately allegro RLE sprites become less usabe when we need alpha blitting and lighting effects (nice features like fire and smoke, night missions simulation).
Here is a list of problems with RLE sprites:
1. When using blenders for alpha channel or transparency callback is called for each pixel! That's very slow
2. There is no standard blender functions than can implement both alpha channel and tint sprite to some color at the same time (alpha + lighting)
3. Function get_rle_sprite() is extremely slow as it uses getpixel() internally
4. Created RLE sprites are not optimal - each line can contain unnecessary trailing skip run of pixels
5. The use of 'magic pink' code as end of line marker is not a very good choice, it adds artificial limitation on the maximal length of each chunk of pixels (127), also using 0 as a marker create better optimization possibilities for most architectures (on x86 we need only three instructions 'cmp ..., 0' -> 'js' -> 'jz' to select one of the three needed branches)
6. Seems like there is a design problem in allegro as it is very hard to see the difference between RGB and RGBA bitmaps in 32bpp mode (after loading PNG or TGA picture). Maybe each bitmap should also have some flag indicating alpha channel presence/absence which should be set by image loaders?
As a solution to some of these problems, we have created an optimized RLE sprites functions built on top af allegro RLE. It allows fast alpha blending and tinting to black. Sprites are versatile and contain alpha channel only when needed. That means you can mix ordinary RLE sprites (fast blitting and low memory requirements) with sprites that really contain alpha channel. Also these functions are very portable and also run well on ARM cpu as they were mostly developed as part of porting UFO2000 to Nokia 770, see http://www.allegro.cc/forums/thread/561300 for more details. This code doesn't use any MMX extensions (for portability) and is just composed of unrolled and optimized allegro RLE functions (and shamelessly relicensed to GPL ).
Here is the download link:
http://ufo2000.sourceforge.net/files/spritelib-20060305.tar.gz
Here is interface part of these sprite functions:
1 | /** |
2 | * Create a versatile sprite, which can support alpha transparency |
3 | * and different brightness levels. |
4 | */ |
5 | ALPHA_SPRITE *get_alpha_sprite(BITMAP *bmp); |
6 | |
7 | /** |
8 | * Destroy alpha sprite. |
9 | */ |
10 | void destroy_alpha_sprite(ALPHA_SPRITE *spr); |
11 | |
12 | /** |
13 | * Draws a darkened sprite with alpha transparency support. It is optimized |
14 | * for 16bpp mode and is faster than allegro functions. |
15 | * |
16 | * @param dst destination bitmap |
17 | * @param src source bitmap |
18 | * @param dx target x coordinate |
19 | * @param dy target y coordinate |
20 | * @param brightness brightness (0 - black image, 255 - original unmodified image) |
21 | */ |
22 | void draw_alpha_sprite(BITMAP *dst, ALPHA_SPRITE *src, int dx, int dy, unsigned int brightness = 255); |
Here are some benchmarks:
1 | Athlon XP 2400+ (2.0 GHz, NForce2, DDR266) |
2 | gcc version 3.4.4 |
3 | optimization flags: -O2 -fomit-frame-pointer |
4 | note: with '-march=athlon-xp' option added results are almost the same |
5 | |
6 | "./test 16" |
7 | --- |
8 | explosion sprites per second (spritelib) = 10309.3 |
9 | explosion sprites per second (allegro rle) = 5181.3 |
10 | explosion sprites per second (allegro bitmap) = 3367.0 |
11 | --- |
12 | normal fire sprites per second (spritelib) = 181818.2 |
13 | normal fire sprites per second (allegro rle) = 173611.1 |
14 | --- |
15 | lit fire sprites per second (spritelib) = 147492.6 |
16 | lit fire sprites per second (allegro rle) = 92592.6 |
17 | --- |
18 | alpha fire sprites per second (spritelib) = 102459.0 |
19 | alpha fire sprites per second (allegro rle) = 73099.4 |
20 | |
21 | ------------------------------------------------------------------------------ |
22 | |
23 | Nokia 770 Internet Tablet (250MHz OMAP1710) |
24 | gcc version 3.3.4 |
25 | optimization flags: -O2 -fomit-frame-pointer |
26 | |
27 | "./test 16" |
28 | --- |
29 | explosion sprites per second (spritelib) = 823.7 |
30 | explosion sprites per second (allegro rle) = 386.8 |
31 | explosion sprites per second (allegro bitmap) = 273.2 |
32 | --- |
33 | normal fire sprites per second (spritelib) = 21376.7 |
34 | normal fire sprites per second (allegro rle) = 16863.4 |
35 | --- |
36 | lit fire sprites per second (spritelib) = 16244.3 |
37 | lit fire sprites per second (allegro rle) = 7961.8 |
38 | --- |
39 | alpha fire sprites per second (spritelib) = 13358.3 |
40 | alpha fire sprites per second (allegro rle) = 6463.3 |
41 | |
42 | ------------------------------------------------------------------------------ |
43 | |
44 | Nokia 770 Internet Tablet (250MHz OMAP1710) |
45 | gcc version 3.3.4 |
46 | optimization flags: -O2 -fomit-frame-pointer -march=armv5te |
47 | |
48 | "./test 16" |
49 | --- |
50 | explosion sprites per second (spritelib) = 801.9 |
51 | explosion sprites per second (allegro rle) = 388.8 |
52 | explosion sprites per second (allegro bitmap) = 270.6 |
53 | --- |
54 | normal fire sprites per second (spritelib) = 30413.6 |
55 | normal fire sprites per second (allegro rle) = 16683.4 |
56 | --- |
57 | lit fire sprites per second (spritelib) = 20145.0 |
58 | lit fire sprites per second (allegro rle) = 7921.4 |
59 | --- |
60 | alpha fire sprites per second (spritelib) = 13161.4 |
61 | alpha fire sprites per second (allegro rle) = 6379.2 |
Todo: Improve get_alpha_sprite() function as it relies on allegro get_rle_sprite() and is slow
There are options what to do next:
1. Convert this code into addon library
2. Add some improvements to the allegro library itself, so that fast alpha blending becomes available to more
3. Do nothing and use this code entirely as a part of UFO2000 project
That's why I need your feedback and probably results of benchmarks on different CPU architectures.
Without even looking at your code I can tell you that the choice of GPL instead of the Allegro license will severely limit the acceptance of it...
Don't you think that the MMX version of color mixing in 32bpp I posted in the other thread could be used to further improve the performance?
This code is currently GPL licensed as it is part of UFO2000 project right now (which is GPL licensed itself). In the case if it becomes addon library or gets (partially?) included into allegro library, it will have allegro license of course
That's why I posted it here and wait for feedback. If no feedback is received, it will remain a part of UFO2000 (see option 3) as I don't feel like doing useless work. The scope of this code is currently fast and portable 16bpp alpha blending for mem->mem blitting of RLE sprites (needed on Nokia 770) and it seems to serve it well. Depending on community interest, it might grow into something more useful
Also my intention was to eventually make some patches ready for inclusion into allegro improving RLE sprites support. Alpha blending is slow in allegro mostly not because it is not MMX or whatever optimized, it just uses callbacks that slow everything down (see problem 1). Probably some standard blenders could be inlined into some special versions of optimized functions, alpha blender is the first candidate, but that would not serve UFO2000 well as there is no standard blender for what it needs (see problem 2). So improving performance of current allegro API would solve only some problems (but if these changes get accepted, that would be also good).
PS. Test program which verifies both performance and correctness (by comparing results of optimized and standard allegro blenders) is included, so it should not be too hard to try it.
Don't you think that the MMX version of color mixing in 32bpp I posted in the other thread could be used to further improve the performance?
I'll try it, thanks edit: Tried, it really speeds up the program, but only about 5% (maybe I just have a fast cpu with slow memory and memory is the bottleneck). In addition it requires 'inline' changed to 'INLINE' for BlendColorsNoEmms, otherwise it actually gets even slower ('inline' unfortunately is only a hint for gcc and it seems to ignore it):
#ifdef __GNUC__ #define INLINE inline __attribute__((always_inline)) #else #define INLINE inline #endif
By the way, one more improvement (but not related to alpha blending) and which is also not very portable is to add support for compiled sprites. When checking for clipping, there is a place in the code where we are sure that the sprite is not clipped at all. And it means that we can add <b>COMPILED_SPRITE member to ALPHA_SPRITE struct, initialize it for the sprites which do not have alpha channel and use them when no clipping is needed -> improve performance when no blending or tinting is required, but also increase memory requirements.
Thanks for the INLINE hint. I would've expected the compiler to know better, though...
Thanks for the INLINE hint. I would've expected the compiler to know better, though...
After having a second look, appears that I deceived you somewhat. Compiler seems to inline BlendColorsNoEmms() fine on normal code.
But in my 'sprite.cpp' I heavily use that '__attribute__((always_inline))' option in order to force inlining, otherwise gcc refuses to inline some of the functions containing loops, but they still affect performance much (maybe it wouldn't if used with profile guided optimization though, but that's not convenient). Anyway, in this particular source file with lots of functions with forced inline, gcc seems to take a revenge on BlendColorsNoEmms() function and DOES NOT inline it if it does not have that '__attribute__((always_inline))' That's weird.
There's a limit GCC has for the size of code that it will inline. You can change this limit with some command line switch, but I don't remember what it is.
It also by default doesnt inline most things with loops, generally because it'll make the size of the code explode to insane proportions. (since loops are unrolled..)
Well, but what about RLE sprites and allegro blenders? Is anybody using them? Or everyone switched to OpenLayer already?
It would be interesting to see benchmark of my little test program on Pentium 4 and Mac.
I'm interested and I'm going to take a look at it.
...
uhm... well, the explosion sprites I have don't have an alpha layer, so I can't use the alpha thingy, until I get a replacement from "somewhere" - sorry, that can take a while.