Improvement for allegro RLE sprites
serge

RLE sprites are good for isometric tile based games (such as ufo2000). We don't need rotation or any other advanced features, and very fast blitting and reduced memory usage have their advantages here.

But unfortunately allegro RLE sprites become less usabe when we need alpha blitting and lighting effects (nice features like fire and smoke, night missions simulation).

Here is a list of problems with RLE sprites:
1. When using blenders for alpha channel or transparency callback is called for each pixel! That's very slow
2. There is no standard blender functions than can implement both alpha channel and tint sprite to some color at the same time (alpha + lighting)
3. Function get_rle_sprite() is extremely slow as it uses getpixel() internally
4. Created RLE sprites are not optimal - each line can contain unnecessary trailing skip run of pixels
5. The use of 'magic pink' code as end of line marker is not a very good choice, it adds artificial limitation on the maximal length of each chunk of pixels (127), also using 0 as a marker create better optimization possibilities for most architectures (on x86 we need only three instructions 'cmp ..., 0' -> 'js' -> 'jz' to select one of the three needed branches)
6. Seems like there is a design problem in allegro as it is very hard to see the difference between RGB and RGBA bitmaps in 32bpp mode (after loading PNG or TGA picture). Maybe each bitmap should also have some flag indicating alpha channel presence/absence which should be set by image loaders?

As a solution to some of these problems, we have created an optimized RLE sprites functions built on top af allegro RLE. It allows fast alpha blending and tinting to black. Sprites are versatile and contain alpha channel only when needed. That means you can mix ordinary RLE sprites (fast blitting and low memory requirements) with sprites that really contain alpha channel. Also these functions are very portable and also run well on ARM cpu as they were mostly developed as part of porting UFO2000 to Nokia 770, see http://www.allegro.cc/forums/thread/561300 for more details. This code doesn't use any MMX extensions (for portability) and is just composed of unrolled and optimized allegro RLE functions (and shamelessly relicensed to GPL :) ).

Here is the download link:
http://ufo2000.sourceforge.net/files/spritelib-20060305.tar.gz

Here is interface part of these sprite functions:

1/**
2 * Create a versatile sprite, which can support alpha transparency
3 * and different brightness levels.
4 */
5ALPHA_SPRITE *get_alpha_sprite(BITMAP *bmp);
6 
7/**
8 * Destroy alpha sprite.
9 */
10void destroy_alpha_sprite(ALPHA_SPRITE *spr);
11 
12/**
13 * Draws a darkened sprite with alpha transparency support. It is optimized
14 * for 16bpp mode and is faster than allegro functions.
15 *
16 * @param dst destination bitmap
17 * @param src source bitmap
18 * @param dx target x coordinate
19 * @param dy target y coordinate
20 * @param brightness brightness (0 - black image, 255 - original unmodified image)
21 */
22void draw_alpha_sprite(BITMAP *dst, ALPHA_SPRITE *src, int dx, int dy, unsigned int brightness = 255);

Here are some benchmarks:

1Athlon XP 2400+ (2.0 GHz, NForce2, DDR266)
2gcc version 3.4.4
3optimization flags: -O2 -fomit-frame-pointer
4note: with '-march=athlon-xp' option added results are almost the same
5 
6"./test 16"
7---
8explosion sprites per second (spritelib) = 10309.3
9explosion sprites per second (allegro rle) = 5181.3
10explosion sprites per second (allegro bitmap) = 3367.0
11---
12normal fire sprites per second (spritelib) = 181818.2
13normal fire sprites per second (allegro rle) = 173611.1
14---
15lit fire sprites per second (spritelib) = 147492.6
16lit fire sprites per second (allegro rle) = 92592.6
17---
18alpha fire sprites per second (spritelib) = 102459.0
19alpha fire sprites per second (allegro rle) = 73099.4
20 
21------------------------------------------------------------------------------
22 
23Nokia 770 Internet Tablet (250MHz OMAP1710)
24gcc version 3.3.4
25optimization flags: -O2 -fomit-frame-pointer
26 
27"./test 16"
28---
29explosion sprites per second (spritelib) = 823.7
30explosion sprites per second (allegro rle) = 386.8
31explosion sprites per second (allegro bitmap) = 273.2
32---
33normal fire sprites per second (spritelib) = 21376.7
34normal fire sprites per second (allegro rle) = 16863.4
35---
36lit fire sprites per second (spritelib) = 16244.3
37lit fire sprites per second (allegro rle) = 7961.8
38---
39alpha fire sprites per second (spritelib) = 13358.3
40alpha fire sprites per second (allegro rle) = 6463.3
41 
42------------------------------------------------------------------------------
43 
44Nokia 770 Internet Tablet (250MHz OMAP1710)
45gcc version 3.3.4
46optimization flags: -O2 -fomit-frame-pointer -march=armv5te
47 
48"./test 16"
49---
50explosion sprites per second (spritelib) = 801.9
51explosion sprites per second (allegro rle) = 388.8
52explosion sprites per second (allegro bitmap) = 270.6
53---
54normal fire sprites per second (spritelib) = 30413.6
55normal fire sprites per second (allegro rle) = 16683.4
56---
57lit fire sprites per second (spritelib) = 20145.0
58lit fire sprites per second (allegro rle) = 7921.4
59---
60alpha fire sprites per second (spritelib) = 13161.4
61alpha fire sprites per second (allegro rle) = 6379.2

Todo: Improve get_alpha_sprite() function as it relies on allegro get_rle_sprite() and is slow

There are options what to do next:
1. Convert this code into addon library
2. Add some improvements to the allegro library itself, so that fast alpha blending becomes available to more
3. Do nothing and use this code entirely as a part of UFO2000 project :)

That's why I need your feedback and probably results of benchmarks on different CPU architectures.

gnolam

Without even looking at your code I can tell you that the choice of GPL instead of the Allegro license will severely limit the acceptance of it... :P

Fladimir da Gorf

Don't you think that the MMX version of color mixing in 32bpp I posted in the other thread could be used to further improve the performance? ;)

serge

This code is currently GPL licensed as it is part of UFO2000 project right now (which is GPL licensed itself). In the case if it becomes addon library or gets (partially?) included into allegro library, it will have allegro license of course :)

That's why I posted it here and wait for feedback. If no feedback is received, it will remain a part of UFO2000 (see option 3) as I don't feel like doing useless work. The scope of this code is currently fast and portable 16bpp alpha blending for mem->mem blitting of RLE sprites (needed on Nokia 770) and it seems to serve it well. Depending on community interest, it might grow into something more useful :)

Also my intention was to eventually make some patches ready for inclusion into allegro improving RLE sprites support. Alpha blending is slow in allegro mostly not because it is not MMX or whatever optimized, it just uses callbacks that slow everything down (see problem 1). Probably some standard blenders could be inlined into some special versions of optimized functions, alpha blender is the first candidate, but that would not serve UFO2000 well as there is no standard blender for what it needs (see problem 2). So improving performance of current allegro API would solve only some problems (but if these changes get accepted, that would be also good).

PS. Test program which verifies both performance and correctness (by comparing results of optimized and standard allegro blenders) is included, so it should not be too hard to try it.

Fladmir said:

Don't you think that the MMX version of color mixing in 32bpp I posted in the other thread could be used to further improve the performance? ;)

I'll try it, thanks :) edit: Tried, it really speeds up the program, but only about 5% (maybe I just have a fast cpu with slow memory and memory is the bottleneck). In addition it requires 'inline' changed to 'INLINE' for BlendColorsNoEmms, otherwise it actually gets even slower ('inline' unfortunately is only a hint for gcc and it seems to ignore it):

#ifdef __GNUC__
#define INLINE inline __attribute__((always_inline))
#else
#define INLINE inline
#endif

By the way, one more improvement (but not related to alpha blending) and which is also not very portable is to add support for compiled sprites. When checking for clipping, there is a place in the code where we are sure that the sprite is not clipped at all. And it means that we can add <b>COMPILED_SPRITE member to ALPHA_SPRITE struct, initialize it for the sprites which do not have alpha channel and use them when no clipping is needed -> improve performance when no blending or tinting is required, but also increase memory requirements.

Fladimir da Gorf

Thanks for the INLINE hint. I would've expected the compiler to know better, though...

serge
Fladmir said:

Thanks for the INLINE hint. I would've expected the compiler to know better, though...

After having a second look, appears that I deceived you somewhat. Compiler seems to inline BlendColorsNoEmms() fine on normal code.

But in my 'sprite.cpp' I heavily use that '__attribute__((always_inline))' option in order to force inlining, otherwise gcc refuses to inline some of the functions containing loops, but they still affect performance much (maybe it wouldn't if used with profile guided optimization though, but that's not convenient). Anyway, in this particular source file with lots of functions with forced inline, gcc seems to take a revenge on BlendColorsNoEmms() function and DOES NOT inline it if it does not have that '__attribute__((always_inline))' :) That's weird.

Kitty Cat

There's a limit GCC has for the size of code that it will inline. You can change this limit with some command line switch, but I don't remember what it is.

Thomas Fjellstrom

It also by default doesnt inline most things with loops, generally because it'll make the size of the code explode to insane proportions. (since loops are unrolled..)

serge

Well, but what about RLE sprites and allegro blenders? Is anybody using them? Or everyone switched to OpenLayer already? ;)

It would be interesting to see benchmark of my little test program on Pentium 4 and Mac.

Geoman

I'm interested and I'm going to take a look at it.
...
uhm... well, the explosion sprites I have don't have an alpha layer, so I can't use the alpha thingy, until I get a replacement from "somewhere" - sorry, that can take a while.

Thread #571021. Printed from Allegro.cc