Pushing allegro performance (sprites per second) to its limits
kovarex

This is what we are already doing:

  • Using atlases, optimising them to have the most frequent sprites in the most used atlases.

  • Using al_hold_bitmap_drawing

  • Running Game logic and render data preparation in separate thread (parallel to the rendering)

  • Caching the tiled background and moving it + drawing the edges when the game view moves around.

This way we managed to get quite far (~27 000 sprites drawn per tick at 60 FPS) but it is still not enough with maximum zoomout in crowded places.(screenshot for reference, warning, huge http://www.factorioforums.com/download/big_screenshot.jpg)

When doing performance analysis, I found out, that in the _draw_tinted_rotated_scaled_bitmap_region, big part of the time is lost in the manipulation with backup

  ...
  ALLEGRO_TRANSFORM backup;
  ...
  al_copy_transform(&backup, al_get_current_transform());
  ...
  al_compose_transform(&t, &backup);
  ...
  al_use_transform(&backup);
  ...

When I tried to comment these lines out, the game was working almost perfectly (just some primitives were off, but it is solvable) and it proved to be noticable performance boost as almost all sprites are drawn this way.

Another thing, most of the pictures are also not rotated, so if I made optimised version of the function, that just doesn't count with rotation and used that in these cases, it might help etc.

I'm looking for tricks like this to push the performance to the limits, it can be nasty tricks, it can modify allegro code, it can be heavily customised.

Any advice would be appreciated.

ph03nix

Perhaps instead you could use al_use_transform() with your own ALLEGRO_TRANSFORM matrix and just draw a tinted bitmap instead. You can modify the values of the matrix yourself or use al_scale_transform, al_rotate_transform, and al_translate_transform (and al_identity_transform of course).

If performance is critical I would suggest manipulating the array yourself since it's a 4x4 array but you only need to modify the 2x2 part for 2D, and those functions I mentioned probably multiply two 4x4 matrices together.

EDIT: for sprites not using rotation and scaling, you can simply skip all the matrix stuff and al_use_transform

kazzmir

Just to avoid a dumb situation, are you using an optimized build of Allegro?

SiegeLord
ph03nix said:

Perhaps instead you could use al_use_transform() with your own ALLEGRO_TRANSFORM matrix and just draw a tinted bitmap instead.

All al_draw_*_bitmap calls use the same function (al_draw_tinted_scaled_rotated_bitmap_region), so you are unlikely to get any speedups by switching functions.

kovarex said:

Any advice would be appreciated.

I made a primitives addon replacement for held drawing that did not fiddle with transformations: https://github.com/SiegeLord/FastDraw. I found it to be faster than held drawing (3x as much on my machine). I am implementing vertex buffers for Allegro and will try them for the same purpose. In a different test, they are 1.5x faster than al_draw_prim. So, it might be the case that I'll get nearly 5x faster drawing than default Allegro's by using this approach.

kovarex

One of the time consuming tasks in our render preparation is the sorting of the sprites to be drawn to have the isometric view.
If I understand it correctly, the 2d is rendered in fact as 3d, couldn't we just use some trick, to set the depth of the bitmaps by some formula to avoid sorting of these? So it would be sorted by the hardware almost for free.

kazzmir:
We compile allegro from source as part of the project (our allegro is already modified), so yes.

ph03nix:
This seems to be good idea

SiegeLord:
This looks interesting!
I will probably need to study how the internals and transformation work to not ask potentially stupid question in the future: Could it be extended to support rotating of bitmaps as well?

taronĀ 
kovarex said:

If I understand it correctly, the 2d is rendered in fact as 3d, couldn't we just use some trick, to set the depth of the bitmaps by some formula to avoid sorting of these? So it would be sorted by the hardware almost for free.

Draw the bitmaps on different Z values with depth buffering enabled. Don't ask me how to do that though, I barely know anything about DirectX and even less about OpenGL. I do know that you'll still have to sort any transparent bitmaps yourself.

kovarex

Ok, so after whole day of profiling, digging and fiddling with the code, I achieved 2.5X speed improvement of the rendering method (the call to al_draw_tinted_scaled_rotated_bitmap_region).
Now, I can go up to 40k sprites while keeping 60FPS, and that is only because the sprite preparation and sorting for the render is now slowing it down the most, so after some other changes, I think it could go to 60k or more.

Some of these optimisations were very custom and result of tighter integration of our rendering method with allegro, but big part of it could be applied to allegro to make it generally faster, I believe.
I might propose a patch later.

These changes were the most important

  • The backup could be easily removed as long as I initialised the identity transform in drawing of primitives, but the overall gain is big

  • in d3d drawing, it checks if the VERTICAL/HORIZONTAL flip is active, but that is already dealt with in the allegro method (and the flag is turned off), so these ifs are always off and can be removed

  • The blender was used on every sprite, I put it away completely because of the integration, but some simple condition that would check if it should be applied would help anyway (it takes time)

  • The internal quad drawing called al_get_current_transform and the color converting functions 4 times in a row, while it could just get it once and use it (it really speeds up a lot)

  • The internal quad function could be integrated into the d3d drawing function

Custom optimisations were mainly:
I created all the needed functions with postfix "_optimised" and used some other things, like using global transform and bitmap_target objects (not using the apply_transform methods etc, it also slows thing down), I know it is ugly, but it just helps.

I diminished the method calls by using the internals of the public allegro draw method and the drawer method and used its code directly in my system draw routine, as well as connecting some methods.

All my sprites are sub-bitmaps (parts of atlases), so I could remove those ifs that check for sub-bitmaps.

I removed all the branches we never use (non-accelerated drawing, drawing from backbuffer and similar), smaller functions are better for cache hits.

Dizzy Egg

What compiler flags are you using :P

axilmar
kovarex said:

This way we managed to get quite far (~27 000 sprites drawn per tick at 60 FPS) but it is still not enough with maximum zoomout in crowded places.

Have you thought about using a different algorithm? for example, using lower res textures for lower zoom levels?

SiegeLord
kovarex said:

Could it be extended to support rotating of bitmaps as well?

Yes, sure.

kovarex said:

* The backup could be easily removed as long as I initialised the identity transform in drawing of primitives, but the overall gain is big

I'm not sure I understand... does this work only if you don't use transformations, or will this work if the user has non-identity transformations set? If not, maybe we could detect the identity transform and do a "fast" path if its active.

Quote:

* in d3d drawing, it checks if the VERTICAL/HORIZONTAL flip is active, but that is already dealt with in the allegro method (and the flag is turned off), so these ifs are always off and can be removed
* The internal quad drawing called al_get_current_transform and the color converting functions 4 times in a row, while it could just get it once and use it (it really speeds up a lot)

These two probably can be applied to the non-optimized Allegro functions, no?

Quote:

* The internal quad function could be integrated into the d3d drawing function

Not sure what you mean here.

Thread #612910. Printed from Allegro.cc