Allegro.cc - Online Community

Allegro.cc Forums » Programming Questions » Allegro 5 performance

This thread is locked; no one can reply to it. rss feed Print
 1   2 
Allegro 5 performance
Andrew Gillett
Member #15,868
January 2015

I've been developing a game using Allegro 4, recently I tried switched to Allegro 5 due to some issues which I'll write about in a separate post. However, the performance is much, much worse. In Allegro 4 I was writing all graphics to a memory bitmap, then using AllegroGL to scale the backbuffer to the screen. In Allegro 5 I'm using al_draw_scaled_bitmap to display the backbuffer. With memory bitmaps it's incredibly slow, but even with video bitmaps it's far too slow.

Comparison of profiling results (render1 = write sprites to backbuffer, render2 = scale backbuffer to screen)
(This is on a level where 2000 sprites are being displayed per frame)

Allegro 4:
Render1: 12ms, Render2: 10ms

Allegro 5, new bitmap flags = ALLEGRO_ALPHA_TEST
Render1: 30.9ms Render2: 13.6ms

As above but also with ALLEGRO_NO_PRESERVE_TEXTURE
Render1: 71.4ms Render2: 10.4ms

With ALLEGRO_OPENGL, but without ALLEGRO_NO_PRESERVE_TEXTURE
Render1: 95.9ms Render2: 15.3ms

Turned off alpha channel (will be ok for most sprites):
Render1: 28.9ms Render2: 15.1ms

Attached is a screenshot of the level. This level is much bigger than the standard one, but even on smaller levels the performance is terrible.

The tile sprites, which make up the majority of the sprites in this image, are 32x32.

dthompson
Member #5,749
April 2005
avatar

This sounds symptomatic of memory bitmaps being used elsewhere (ie. as part of your pipeline - not with texture preservation). Allegro 5 isn't good at memory bitmap stuff :P

So - are you using memory bitmaps anywhere in your Allegro 5 code when you say "even with video bitmaps it's far too slow"?

______________________________________________________
Website. It was freakdesign.bafsoft.net.
This isn't a game!

Andrew Gillett
Member #15,868
January 2015

There are only 2 calls to al_create_bitmap and 1 call to al_load_bitmap in the entire codebase, and al_set_new_bitmap_flags is called before each of them. If I call al_get_bitmap_flags before each draw, I get 0x410 (ALLEGRO_VIDEO_BITMAP | ALLEGRO_ALPHA_TEST) for most of them, but the display gets 0x400 (ALLEGRO_VIDEO_BITMAP).

Chris Katko
Member #1,881
January 2002
avatar

It's likely you're doing something wrong. Weird drawing code. Locking bitmaps (which forces copying to memory). Etc.

I get easily 100x performance with Allegro 5 with a similar tiled bitmap game on a tiny netbook.

Is this Linux, Windows, or Mac OS X?

-----sig:
“Programs should be written for people to read, and only incidentally for machines to execute.” - Structure and Interpretation of Computer Programs
"Political Correctness is fascism disguised as manners" --George Carlin

MikiZX
Member #17,092
June 2019

I believe you have likely done some testing but just in case this can help - if you are working with tilemaps and Allegro5 then a good solution to speed things up (considerably) would be to modify your tilemap drawing code to use vertex buffer objects (you can check a non-optimized version of this here https://github.com/mikiZX/Allegro5-2d-Platformer-Demo-using-VBOs-and-Tiled-tilemaps ).
You could also optimize this (and likely your actual code) by segmenting your tilemap in smaller sections and only drawing the ones which are actually visible on the screen (sort of using many 32x32 tiles segments) - in case you are drawing all of them each frame and only part is visible.
As suggested, with Allegro5 there are drawing operations on video bitmaps that require locking of the bitmap before the operation is performed (like i.e. al_put_pixel) - if you use al_put_pixel multiple times without locking the bitmap this could slow things down a lot.

Andrew Gillett
Member #15,868
January 2015

I am on Windows. I am not calling al_lock_bitmap, and all put/get_pixel calls are commented out.

I have done some more tests, and found that even when nothing was being drawn, a single call to OutputDebugString (Microsoft's function for writing to Visual Studio's debug output) caused the enclosing profiling region to randomly vary between low numbers (eg 0.5ms, still not great for a single debug print but not out of the ordinary for that function) and very high numbers, sometimes over 10ms - see attached image.

Problem solved? I re-enabled the drawing code and commented out OutputDebugString, but performance was no better - still in the region of 40-80ms per frame. Both drawing the sprites to the backbuffer, and drawing the backbuffer to the screen, vary by up to a factor of 2, although the latter is more stable.

Does it matter that the backbuffer does not have power of 2 width/height?

MikiZX
Member #17,092
June 2019

For sure the Allegro wizards that follow this forum will be able to provide a definite answer but I've searched Allegro's GitHub repository for 'power-of-two' and 'pot' and it says this (seems to be the case for both OpenGL and DirectX):
"* Also, we do support old OpenGL drivers where textures must have
* power-of-two dimensions. If a non-power-of-two bitmap is created in
* such a case, we use a texture with the next larger POT dimensions,
* and just keep some unused padding space to the right/bottom of the
* pixel data."
So it appears this is handled by Allegro internally so should not be the issue here.
I would try profiling my code and pinpoint exact section of the code that runs slow? Likely there are better ways - but a noob like me - I would keep a global variable and store system time in it ever so often throughout your code and each time just before you store a new system time value in it check out the difference between actual time and the variable itself - if the value is large enough then print out a debug message.

Edgar Reynaldo
Major Reynaldo
May 2007
avatar

In Allegro 5 you just don't use memory bitmaps, hardly ever.

The A5 way is drawing from a tile atlas, you have like 22 unique tiles on that screen at 32x32, that's 2x11=64x352 which would easily fit in an atlas that was 512x512 or better.

8-)

And of course OutputDebugString will slow your program down writing to the console like that. It's much faster to write to a file.

EDIT
Show game loop and rendering code.

kenmasters1976
Member #8,794
July 2007

The A5 way is drawing from a tile atlas

What's the difference between using a 512x512 atlas (together with sub-bitmaps, I assume) and using 22 different source bitmaps? Does it affect performance?.
Also, if I remember correctly, sub-bitmaps doesn't support tiling so, when you need tiling, you can't use them.

Quote:

Show game loop and rendering code

I second this, could be useful for doing a benchmark on different systems. I've noticed that Allegro 5 performance drops when drawing lots of bitmaps but I always blamed it on having an old graphics card and using open source drivers. I've never done any serious benchmark but I've noticed CPU usage increasing considerably when drawing lots of bitmaps, which always seemed a bit off to me considering that Allegro 5 is supposed to do its drawing on the GPU but, as I said, I've never done any serious testing and just considered using the low level primitives in cases when you need to draw lots of bitmaps.

dthompson
Member #5,749
April 2005
avatar

What's the difference between using a 512x512 atlas (together with sub-bitmaps, I assume) and using 22 different source bitmaps? Does it affect performance?

Using 22 separate bitmaps probably won't be an issue, but once you start getting into the hundreds, I'd imagine you'd start seeing a serious dip in performance (or increased memory usage). I'm being quite unscientific here though; I'm just aware that it's better to have fewer discrete textures.

Quote:

I've never done any serious testing and just considered using the low level primitives in cases when you need to draw lots of bitmaps.

Yes indeed - it'll be even faster if you're using vertex buffers.

______________________________________________________
Website. It was freakdesign.bafsoft.net.
This isn't a game!

Polybios
Member #12,293
October 2010

What's the difference between using a 512x512 atlas (together with sub-bitmaps, I assume) and using 22 different source bitmaps? Does it affect performance?.

Yes! IIRC, drawing from different source bitmaps is the equivalent of several OpenGL texture changes. This will slow down drawing and should be avoided if possible (e.g. use an atlas with sub bitmaps, or even sort by source bitmap if possible).

So it's not about the number of source bitmaps but the number of switches between them. IIRC, this should be adressed before attempting to use vertex buffers.

The GPU is happiest if you setup "state" once and then only send geometry or alter matrices.

MikiZX
Member #17,092
June 2019

I've never done any serious benchmark but I've noticed CPU usage increasing considerably when drawing lots of bitmaps, which always seemed a bit off to me considering that Allegro 5 is supposed to do its drawing on the GPU

I doubt this would be due to your graphics card (unless it is over maybe 15 years old). The likely reason the CPU usage would increase is due to 'feeding' the GPU with the data it needs to draw - sort of 'bottlenecking' the GPU-CPU data transfer on the CPU side as CPU will not be able to feed the data to the GPU at the rate at which the GPU can draw. This is I believe the main reason one would like to have the CPU send data only once to the GPU (using VBO) as opposed to sending the data many times (once per bitmap you wish to draw). From my experience (not very extensive, mind you) it is faster to give the CPU task of re-creating the VBO each frame and then drawing that than actually sending each bitmap individually. The atlas idea is used here as one VBO draw call will be bound to only one bitmap(or a single set of bitmaps if a special shader is used) - if you want to draw different bitmaps with VBO you will need to create one VBO per bitmap you will use (and this will again increase the amount of cpu-gpu talk). Thus packing all your bitmaps into a single one and having only one VBO draw call will/should be the fastest way.
Also, what is said above about the context changes is also a factor, one mediated again with using a single vbo.
As for tiling, this can be achieved with VBO though it would require adding additional geometry to simulate the tiling effect (as opposed to simply increasing the UV coordinates which I believe is what one would normally do - if I understand your question correctly).

Edgar Reynaldo
Major Reynaldo
May 2007
avatar

kenmasters1976
Member #8,794
July 2007

Well, that's some interesting info on VBOs which is not immediately obvious when using Allegro 5, in particular if you come from using Allegro 4 in the past. I always thought 'hardware accelerated' meant the graphics card would take care of all the drawing with little to no load on the CPU; I also assumed that loading a bitmap as video bitmap meant it was loaded as a texture on video memory, from where the graphics card could access it and draw it with no load on the CPU either, even when using the traditional Allegro 4 way of drawing things, but apparently it is passing the coordinates/geometry to the GPU what can slow down the process? This requires a whole new way of thinking about the drawing process in Allegro 5.

Chris Katko
Member #1,881
January 2002
avatar

1. Can you just post the whole project so we can profile it?
2. What are your hardware specs?

I have no problems holding 60 FPS... on a netbook... with an i3 celeron and intel HD graphics while drawing multiple tile layers (floors, walls, "decals", "dirt", and lighting layers) in 1366x768.

I don't use VBOs or display lists or anything "fancy".

It's even possible your measuring wrong. If you're running Windows, use one of those nVidia (or Windows [Windows Key + G] or whatever) FPS meters. Linux might have an equivalent.

Also, what's your video mode? You're not in some kind of creepy 24-bit mode? (somehow different than your images in memory, forcing a color conversion every frame.)

Letting other people compile and profile your project will eliminate many of these variables.

-----sig:
“Programs should be written for people to read, and only incidentally for machines to execute.” - Structure and Interpretation of Computer Programs
"Political Correctness is fascism disguised as manners" --George Carlin

Andrew Gillett
Member #15,868
January 2015

I've done some more profiling and I'm getting better results than I had been getting before – so it's possible there was something else going on like a rogue logging call somewhere.

Here are some profiling results from a simplified test where it draws around 2000 32x32 sprites per frame. Render1 is drawing the sprites and Render2 is scaling the backbuffer to the screen. In this test, the screen resolution is 1680x1050 and the back buffer is 1888x1120 (although most levels will be considerably smaller than this).

Allegro 5
Render1: 10.0ms Render2: 18.8ms Frame: 28.8ms
Render1: 9.4ms Render2: 20.9ms Frame: 30.3ms
Render1: 10.0ms Render2: 20.3ms Frame: 30.3ms
Render1: 8.8ms Render2: 23.8ms Frame: 32.6ms
Render1: 8.6ms Render2: 16.9ms Frame: 25.5ms
Render1: 9.9ms Render2: 18.9ms Frame: 28.8ms
Render1: 9.7ms Render2: 20.8ms Frame: 30.6ms

Allegro 4 (Render2 uses allegro_gl_make_texture_ex and draws a quad, Render1 uses draw_rle_sprite)
Render1: 13.6ms Render2: 8.9ms Frame: 22.5ms
Render1: 13.6ms Render2: 8.9ms Frame: 22.5ms
Render1: 13.7ms Render2: 8.9ms Frame: 22.5ms
Render1: 13.7ms Render2: 8.9ms Frame: 22.5ms
Render1: 13.6ms Render2: 8.9ms Frame: 22.5ms
Render1: 13.6ms Render2: 8.9ms Frame: 22.5ms
Render1: 13.6ms Render2: 8.9ms Frame: 22.4ms
Render1: 13.6ms Render2: 8.9ms Frame: 22.5ms

In this case you can see that Allegro 5 is faster at drawing the small sprites but slower at scaling the back buffer to the screen as compared to the AllegroGL code I'm using in the old version. I'm going to look into the practicality of scaling the sprites directly to the screen rather than going through the additional render2 step.

The players in my game are drawn by creating a temporary bitmap, copying the relevant head and body parts to the bitmap, and then drawing that to the screen – with additional steps if lighting and/or transparency are needed. In the Allegro 4 version they are also modified in real time using get/put pixel (primarily replacing the default player colour with the desired player colour, but also sometimes for a shield effect which puts a white outline around the player). I know this kind of pixel replacement is a non-starter for Allegro 5. Even without any lighting or transparency effects, in my test the Allegro 5 version averages 8ms to draw 16 players (49 draw sprite calls), while the Allegro 4 version runs about 20% faster. Given that the pixel replacement stuff is not feasible for Allegro 5, I will have to generate and store the sprites at the start of the level, so that should help performance.

Both machines I've tested on are around six or seven years old and have Intel on-board graphics. Timing uses QueryPerformanceCounter.

Something strange I noticed is that when I request a list of graphics modes in Allegro 5, all the modes are reported as having 23 bit colour depth. However, when I print the actual bitmap colour depths, they are all 32.

dthompson
Member #5,749
April 2005
avatar

Just to be absolutely sure: what frame rates are you getting? Could any of this be vsync-related? I notice that all of the 'Render2' timings are over 16.6ms (some of them very close) which is close to the frame timing of a 60Hz display with vsync: 1000ms / 60Hz = 16.6ms.

______________________________________________________
Website. It was freakdesign.bafsoft.net.
This isn't a game!

Andrew Gillett
Member #15,868
January 2015

vsync is off. I just tested it on a much better PC and got these timings (window resolution was 1768x992):

Render1: 6.0ms Render2: 3.9ms Frame: 9.9ms
Render1: 6.0ms Render2: 4.0ms Frame: 10.0ms
Render1: 6.0ms Render2: 4.0ms Frame: 10.0ms
Render1: 6.0ms Render2: 3.9ms Frame: 9.9ms
Render1: 6.1ms Render2: 4.0ms Frame: 10.1ms
Render1: 6.0ms Render2: 4.0ms Frame: 10.0ms
Render1: 6.0ms Render2: 4.2ms Frame: 10.2ms
Render1: 6.2ms Render2: 3.9ms Frame: 10.2ms
Render1: 6.0ms Render2: 3.9ms Frame: 9.9ms

kenmasters1976
Member #8,794
July 2007

I'm going to look into the practicality of scaling the sprites directly to the screen rather than going through the additional render2 step

In my recent Allegro 5 project I did set up an allegro transformation so that all drawing is automatically scaled to fit the screen/window size. It seems to work pretty good for 2D.

I just tested it on a much better PC and got these timings

Is the Allegro 4 timing still better?.

Andrew Gillett
Member #15,868
January 2015

Timings from a Paperspace cloud PC, which is faster than mine:

al5
Render1: 9.0ms Render2: 4.9ms Frame: 13.9ms
Render1: 9.8ms Render2: 5.1ms Frame: 14.9ms
Render1: 11.0ms Render2: 5.5ms Frame: 16.5ms
Render1: 10.9ms Render2: 4.4ms Frame: 15.3ms
Render1: 9.2ms Render2: 4.3ms Frame: 13.5ms
Render1: 8.7ms Render2: 4.7ms Frame: 13.4ms
Render1: 8.7ms Render2: 4.4ms Frame: 13.2ms
Render1: 9.8ms Render2: 4.1ms Frame: 13.9ms
Render1: 9.3ms Render2: 4.0ms Frame: 13.3ms
Render1: 9.1ms Render2: 4.1ms Frame: 13.2ms
Render1: 9.8ms Render2: 4.6ms Frame: 14.4ms
Render1: 9.8ms Render2: 5.6ms Frame: 15.4ms
Render1: 9.3ms Render2: 4.0ms Frame: 13.3ms

al4
Render1: 11.0ms Render2: 2.6ms Frame: 13.6ms
Render1: 11.0ms Render2: 2.6ms Frame: 13.6ms
Render1: 10.9ms Render2: 2.6ms Frame: 13.5ms
Render1: 10.8ms Render2: 2.5ms Frame: 13.4ms
Render1: 10.8ms Render2: 2.5ms Frame: 13.3ms
Render1: 10.8ms Render2: 2.5ms Frame: 13.2ms
Render1: 10.7ms Render2: 2.4ms Frame: 13.2ms
Render1: 10.6ms Render2: 2.4ms Frame: 13.1ms
Render1: 10.6ms Render2: 2.4ms Frame: 13.0ms
Render1: 10.5ms Render2: 2.4ms Frame: 13.0ms

EDIT: I originally wrote that the Al5 version varies a lot in terms of frame timings but I just realised that the Al4 version is showing smoothed out timings (as this text was originally written to the screen and is hard to read if the numbers are varying a lot every frame), whereas I changed this in the Al5 version.

I tried scaling sprites directly to the screen rather than using an intermediate backbuffer. These are the timings I got on my desktop - a little better than before but not much.

Render1: 8.6ms Render2: 13.9ms Frame: 22.6ms
Render1: 10.0ms Render2: 18.2ms Frame: 28.2ms
Render1: 8.4ms Render2: 22.7ms Frame: 31.1ms
Render1: 8.8ms Render2: 15.6ms Frame: 24.4ms
Render1: 11.2ms Render2: 16.8ms Frame: 28.1ms
Render1: 9.3ms Render2: 20.6ms Frame: 29.9ms
Render1: 8.6ms Render2: 15.0ms Frame: 23.6ms

The problem is that the scaling ratio is not a whole number, so in this level I end up with a 1 pixel gap every 2 or 3 tiles.

Edgar Reynaldo
Major Reynaldo
May 2007
avatar

Andrew Gillett
Member #15,868
January 2015

I get 60fps +- 2, varies between around 1200 and 2500 / sec.

With 1024 sprites, 29fps +- 0, 140-230/sec.

Edgar Reynaldo
Major Reynaldo
May 2007
avatar

Ok, second question. Are you running on an integrated GPU or a dedicated card. What are your CPU specs and GPU model(s). Have you updated your drivers since purchasing the card?

EDIT
Also, those numbers are pretty bad. Have you exceeded the maximum texture size and inadvertently produced a memory bitmap? How old is your gpu?

Andrew Gillett
Member #15,868
January 2015

Intel Core i5-3570K with integrated GPU, latest drivers
https://ark.intel.com/content/www/us/en/ark/products/65520/intel-core-i5-3570k-processor-6m-cache-up-to-3-80-ghz.html

As mentioned previously, I put in some code to confirm that none of the bitmaps being drawn to are memory bitmaps.

but...

I just tried ALLEGRO_NO_PRESERVE_TEXTURE, having tried it a while back, and now I get vastly better Render2 performance.

Render1: 8.5ms Render2: 0.5ms Frame: 9.0ms
Render1: 8.6ms Render2: 0.6ms Frame: 9.2ms
Render1: 9.3ms Render2: 0.7ms Frame: 9.9ms
Render1: 8.3ms Render2: 0.7ms Frame: 9.0ms
Render1: 8.6ms Render2: 0.6ms Frame: 9.3ms
Render1: 8.7ms Render2: 0.6ms Frame: 9.3ms
Render1: 8.5ms Render2: 0.5ms Frame: 9.1ms

Edgar Reynaldo
Major Reynaldo
May 2007
avatar

 1   2 


Go to: