Allegro 5 performance
Andrew Gillett

I've been developing a game using Allegro 4, recently I tried switched to Allegro 5 due to some issues which I'll write about in a separate post. However, the performance is much, much worse. In Allegro 4 I was writing all graphics to a memory bitmap, then using AllegroGL to scale the backbuffer to the screen. In Allegro 5 I'm using al_draw_scaled_bitmap to display the backbuffer. With memory bitmaps it's incredibly slow, but even with video bitmaps it's far too slow.

Comparison of profiling results (render1 = write sprites to backbuffer, render2 = scale backbuffer to screen)
(This is on a level where 2000 sprites are being displayed per frame)

Allegro 4:
Render1: 12ms, Render2: 10ms

Allegro 5, new bitmap flags = ALLEGRO_ALPHA_TEST
Render1: 30.9ms Render2: 13.6ms

As above but also with ALLEGRO_NO_PRESERVE_TEXTURE
Render1: 71.4ms Render2: 10.4ms

With ALLEGRO_OPENGL, but without ALLEGRO_NO_PRESERVE_TEXTURE
Render1: 95.9ms Render2: 15.3ms

Turned off alpha channel (will be ok for most sprites):
Render1: 28.9ms Render2: 15.1ms

Attached is a screenshot of the level. This level is much bigger than the standard one, but even on smaller levels the performance is terrible.

The tile sprites, which make up the majority of the sprites in this image, are 32x32.

dthompson

This sounds symptomatic of memory bitmaps being used elsewhere (ie. as part of your pipeline - not with texture preservation). Allegro 5 isn't good at memory bitmap stuff :P

So - are you using memory bitmaps anywhere in your Allegro 5 code when you say "even with video bitmaps it's far too slow"?

Andrew Gillett

There are only 2 calls to al_create_bitmap and 1 call to al_load_bitmap in the entire codebase, and al_set_new_bitmap_flags is called before each of them. If I call al_get_bitmap_flags before each draw, I get 0x410 (ALLEGRO_VIDEO_BITMAP | ALLEGRO_ALPHA_TEST) for most of them, but the display gets 0x400 (ALLEGRO_VIDEO_BITMAP).

Chris Katko

It's likely you're doing something wrong. Weird drawing code. Locking bitmaps (which forces copying to memory). Etc.

I get easily 100x performance with Allegro 5 with a similar tiled bitmap game on a tiny netbook.

Is this Linux, Windows, or Mac OS X?

MikiZX

I believe you have likely done some testing but just in case this can help - if you are working with tilemaps and Allegro5 then a good solution to speed things up (considerably) would be to modify your tilemap drawing code to use vertex buffer objects (you can check a non-optimized version of this here https://github.com/mikiZX/Allegro5-2d-Platformer-Demo-using-VBOs-and-Tiled-tilemaps ).
You could also optimize this (and likely your actual code) by segmenting your tilemap in smaller sections and only drawing the ones which are actually visible on the screen (sort of using many 32x32 tiles segments) - in case you are drawing all of them each frame and only part is visible.
As suggested, with Allegro5 there are drawing operations on video bitmaps that require locking of the bitmap before the operation is performed (like i.e. al_put_pixel) - if you use al_put_pixel multiple times without locking the bitmap this could slow things down a lot.

Andrew Gillett

I am on Windows. I am not calling al_lock_bitmap, and all put/get_pixel calls are commented out.

I have done some more tests, and found that even when nothing was being drawn, a single call to OutputDebugString (Microsoft's function for writing to Visual Studio's debug output) caused the enclosing profiling region to randomly vary between low numbers (eg 0.5ms, still not great for a single debug print but not out of the ordinary for that function) and very high numbers, sometimes over 10ms - see attached image.

Problem solved? I re-enabled the drawing code and commented out OutputDebugString, but performance was no better - still in the region of 40-80ms per frame. Both drawing the sprites to the backbuffer, and drawing the backbuffer to the screen, vary by up to a factor of 2, although the latter is more stable.

Does it matter that the backbuffer does not have power of 2 width/height?

MikiZX

For sure the Allegro wizards that follow this forum will be able to provide a definite answer but I've searched Allegro's GitHub repository for 'power-of-two' and 'pot' and it says this (seems to be the case for both OpenGL and DirectX):
"* Also, we do support old OpenGL drivers where textures must have
* power-of-two dimensions. If a non-power-of-two bitmap is created in
* such a case, we use a texture with the next larger POT dimensions,
* and just keep some unused padding space to the right/bottom of the
* pixel data."
So it appears this is handled by Allegro internally so should not be the issue here.
I would try profiling my code and pinpoint exact section of the code that runs slow? Likely there are better ways - but a noob like me - I would keep a global variable and store system time in it ever so often throughout your code and each time just before you store a new system time value in it check out the difference between actual time and the variable itself - if the value is large enough then print out a debug message.

Edgar Reynaldo

In Allegro 5 you just don't use memory bitmaps, hardly ever.

The A5 way is drawing from a tile atlas, you have like 22 unique tiles on that screen at 32x32, that's 2x11=64x352 which would easily fit in an atlas that was 512x512 or better.

8-)

And of course OutputDebugString will slow your program down writing to the console like that. It's much faster to write to a file.

EDIT
Show game loop and rendering code.

kenmasters1976

The A5 way is drawing from a tile atlas

What's the difference between using a 512x512 atlas (together with sub-bitmaps, I assume) and using 22 different source bitmaps? Does it affect performance?.
Also, if I remember correctly, sub-bitmaps doesn't support tiling so, when you need tiling, you can't use them.

Quote:

Show game loop and rendering code

I second this, could be useful for doing a benchmark on different systems. I've noticed that Allegro 5 performance drops when drawing lots of bitmaps but I always blamed it on having an old graphics card and using open source drivers. I've never done any serious benchmark but I've noticed CPU usage increasing considerably when drawing lots of bitmaps, which always seemed a bit off to me considering that Allegro 5 is supposed to do its drawing on the GPU but, as I said, I've never done any serious testing and just considered using the low level primitives in cases when you need to draw lots of bitmaps.

dthompson

What's the difference between using a 512x512 atlas (together with sub-bitmaps, I assume) and using 22 different source bitmaps? Does it affect performance?

Using 22 separate bitmaps probably won't be an issue, but once you start getting into the hundreds, I'd imagine you'd start seeing a serious dip in performance (or increased memory usage). I'm being quite unscientific here though; I'm just aware that it's better to have fewer discrete textures.

Quote:

I've never done any serious testing and just considered using the low level primitives in cases when you need to draw lots of bitmaps.

Yes indeed - it'll be even faster if you're using vertex buffers.

Polybios

What's the difference between using a 512x512 atlas (together with sub-bitmaps, I assume) and using 22 different source bitmaps? Does it affect performance?.

Yes! IIRC, drawing from different source bitmaps is the equivalent of several OpenGL texture changes. This will slow down drawing and should be avoided if possible (e.g. use an atlas with sub bitmaps, or even sort by source bitmap if possible).

So it's not about the number of source bitmaps but the number of switches between them. IIRC, this should be adressed before attempting to use vertex buffers.

The GPU is happiest if you setup "state" once and then only send geometry or alter matrices.

MikiZX

I've never done any serious benchmark but I've noticed CPU usage increasing considerably when drawing lots of bitmaps, which always seemed a bit off to me considering that Allegro 5 is supposed to do its drawing on the GPU

I doubt this would be due to your graphics card (unless it is over maybe 15 years old). The likely reason the CPU usage would increase is due to 'feeding' the GPU with the data it needs to draw - sort of 'bottlenecking' the GPU-CPU data transfer on the CPU side as CPU will not be able to feed the data to the GPU at the rate at which the GPU can draw. This is I believe the main reason one would like to have the CPU send data only once to the GPU (using VBO) as opposed to sending the data many times (once per bitmap you wish to draw). From my experience (not very extensive, mind you) it is faster to give the CPU task of re-creating the VBO each frame and then drawing that than actually sending each bitmap individually. The atlas idea is used here as one VBO draw call will be bound to only one bitmap(or a single set of bitmaps if a special shader is used) - if you want to draw different bitmaps with VBO you will need to create one VBO per bitmap you will use (and this will again increase the amount of cpu-gpu talk). Thus packing all your bitmaps into a single one and having only one VBO draw call will/should be the fastest way.
Also, what is said above about the context changes is also a factor, one mediated again with using a single vbo.
As for tiling, this can be achieved with VBO though it would require adding additional geometry to simulate the tiling effect (as opposed to simply increasing the UV coordinates which I believe is what one would normally do - if I understand your question correctly).

Edgar Reynaldo

If you don't need tiling, atlas'es and sub bitmaps are the way to go.

If you need tiling, you need either a single texture one texture per tile, or a shader.

kenmasters1976

Well, that's some interesting info on VBOs which is not immediately obvious when using Allegro 5, in particular if you come from using Allegro 4 in the past. I always thought 'hardware accelerated' meant the graphics card would take care of all the drawing with little to no load on the CPU; I also assumed that loading a bitmap as video bitmap meant it was loaded as a texture on video memory, from where the graphics card could access it and draw it with no load on the CPU either, even when using the traditional Allegro 4 way of drawing things, but apparently it is passing the coordinates/geometry to the GPU what can slow down the process? This requires a whole new way of thinking about the drawing process in Allegro 5.

Chris Katko

1. Can you just post the whole project so we can profile it?
2. What are your hardware specs?

I have no problems holding 60 FPS... on a netbook... with an i3 celeron and intel HD graphics while drawing multiple tile layers (floors, walls, "decals", "dirt", and lighting layers) in 1366x768.

I don't use VBOs or display lists or anything "fancy".

It's even possible your measuring wrong. If you're running Windows, use one of those nVidia (or Windows [Windows Key + G] or whatever) FPS meters. Linux might have an equivalent.

Also, what's your video mode? You're not in some kind of creepy 24-bit mode? (somehow different than your images in memory, forcing a color conversion every frame.)

Letting other people compile and profile your project will eliminate many of these variables.

Andrew Gillett

I've done some more profiling and I'm getting better results than I had been getting before – so it's possible there was something else going on like a rogue logging call somewhere.

Here are some profiling results from a simplified test where it draws around 2000 32x32 sprites per frame. Render1 is drawing the sprites and Render2 is scaling the backbuffer to the screen. In this test, the screen resolution is 1680x1050 and the back buffer is 1888x1120 (although most levels will be considerably smaller than this).

Allegro 5
Render1: 10.0ms Render2: 18.8ms Frame: 28.8ms
Render1: 9.4ms Render2: 20.9ms Frame: 30.3ms
Render1: 10.0ms Render2: 20.3ms Frame: 30.3ms
Render1: 8.8ms Render2: 23.8ms Frame: 32.6ms
Render1: 8.6ms Render2: 16.9ms Frame: 25.5ms
Render1: 9.9ms Render2: 18.9ms Frame: 28.8ms
Render1: 9.7ms Render2: 20.8ms Frame: 30.6ms

Allegro 4 (Render2 uses allegro_gl_make_texture_ex and draws a quad, Render1 uses draw_rle_sprite)
Render1: 13.6ms Render2: 8.9ms Frame: 22.5ms
Render1: 13.6ms Render2: 8.9ms Frame: 22.5ms
Render1: 13.7ms Render2: 8.9ms Frame: 22.5ms
Render1: 13.7ms Render2: 8.9ms Frame: 22.5ms
Render1: 13.6ms Render2: 8.9ms Frame: 22.5ms
Render1: 13.6ms Render2: 8.9ms Frame: 22.5ms
Render1: 13.6ms Render2: 8.9ms Frame: 22.4ms
Render1: 13.6ms Render2: 8.9ms Frame: 22.5ms

In this case you can see that Allegro 5 is faster at drawing the small sprites but slower at scaling the back buffer to the screen as compared to the AllegroGL code I'm using in the old version. I'm going to look into the practicality of scaling the sprites directly to the screen rather than going through the additional render2 step.

The players in my game are drawn by creating a temporary bitmap, copying the relevant head and body parts to the bitmap, and then drawing that to the screen – with additional steps if lighting and/or transparency are needed. In the Allegro 4 version they are also modified in real time using get/put pixel (primarily replacing the default player colour with the desired player colour, but also sometimes for a shield effect which puts a white outline around the player). I know this kind of pixel replacement is a non-starter for Allegro 5. Even without any lighting or transparency effects, in my test the Allegro 5 version averages 8ms to draw 16 players (49 draw sprite calls), while the Allegro 4 version runs about 20% faster. Given that the pixel replacement stuff is not feasible for Allegro 5, I will have to generate and store the sprites at the start of the level, so that should help performance.

Both machines I've tested on are around six or seven years old and have Intel on-board graphics. Timing uses QueryPerformanceCounter.

Something strange I noticed is that when I request a list of graphics modes in Allegro 5, all the modes are reported as having 23 bit colour depth. However, when I print the actual bitmap colour depths, they are all 32.

dthompson

Just to be absolutely sure: what frame rates are you getting? Could any of this be vsync-related? I notice that all of the 'Render2' timings are over 16.6ms (some of them very close) which is close to the frame timing of a 60Hz display with vsync: 1000ms / 60Hz = 16.6ms.

Andrew Gillett

vsync is off. I just tested it on a much better PC and got these timings (window resolution was 1768x992):

Render1: 6.0ms Render2: 3.9ms Frame: 9.9ms
Render1: 6.0ms Render2: 4.0ms Frame: 10.0ms
Render1: 6.0ms Render2: 4.0ms Frame: 10.0ms
Render1: 6.0ms Render2: 3.9ms Frame: 9.9ms
Render1: 6.1ms Render2: 4.0ms Frame: 10.1ms
Render1: 6.0ms Render2: 4.0ms Frame: 10.0ms
Render1: 6.0ms Render2: 4.2ms Frame: 10.2ms
Render1: 6.2ms Render2: 3.9ms Frame: 10.2ms
Render1: 6.0ms Render2: 3.9ms Frame: 9.9ms

kenmasters1976

I'm going to look into the practicality of scaling the sprites directly to the screen rather than going through the additional render2 step

In my recent Allegro 5 project I did set up an allegro transformation so that all drawing is automatically scaled to fit the screen/window size. It seems to work pretty good for 2D.

I just tested it on a much better PC and got these timings

Is the Allegro 4 timing still better?.

Andrew Gillett

Timings from a Paperspace cloud PC, which is faster than mine:

al5
Render1: 9.0ms Render2: 4.9ms Frame: 13.9ms
Render1: 9.8ms Render2: 5.1ms Frame: 14.9ms
Render1: 11.0ms Render2: 5.5ms Frame: 16.5ms
Render1: 10.9ms Render2: 4.4ms Frame: 15.3ms
Render1: 9.2ms Render2: 4.3ms Frame: 13.5ms
Render1: 8.7ms Render2: 4.7ms Frame: 13.4ms
Render1: 8.7ms Render2: 4.4ms Frame: 13.2ms
Render1: 9.8ms Render2: 4.1ms Frame: 13.9ms
Render1: 9.3ms Render2: 4.0ms Frame: 13.3ms
Render1: 9.1ms Render2: 4.1ms Frame: 13.2ms
Render1: 9.8ms Render2: 4.6ms Frame: 14.4ms
Render1: 9.8ms Render2: 5.6ms Frame: 15.4ms
Render1: 9.3ms Render2: 4.0ms Frame: 13.3ms

al4
Render1: 11.0ms Render2: 2.6ms Frame: 13.6ms
Render1: 11.0ms Render2: 2.6ms Frame: 13.6ms
Render1: 10.9ms Render2: 2.6ms Frame: 13.5ms
Render1: 10.8ms Render2: 2.5ms Frame: 13.4ms
Render1: 10.8ms Render2: 2.5ms Frame: 13.3ms
Render1: 10.8ms Render2: 2.5ms Frame: 13.2ms
Render1: 10.7ms Render2: 2.4ms Frame: 13.2ms
Render1: 10.6ms Render2: 2.4ms Frame: 13.1ms
Render1: 10.6ms Render2: 2.4ms Frame: 13.0ms
Render1: 10.5ms Render2: 2.4ms Frame: 13.0ms

EDIT: I originally wrote that the Al5 version varies a lot in terms of frame timings but I just realised that the Al4 version is showing smoothed out timings (as this text was originally written to the screen and is hard to read if the numbers are varying a lot every frame), whereas I changed this in the Al5 version.

I tried scaling sprites directly to the screen rather than using an intermediate backbuffer. These are the timings I got on my desktop - a little better than before but not much.

Render1: 8.6ms Render2: 13.9ms Frame: 22.6ms
Render1: 10.0ms Render2: 18.2ms Frame: 28.2ms
Render1: 8.4ms Render2: 22.7ms Frame: 31.1ms
Render1: 8.8ms Render2: 15.6ms Frame: 24.4ms
Render1: 11.2ms Render2: 16.8ms Frame: 28.1ms
Render1: 9.3ms Render2: 20.6ms Frame: 29.9ms
Render1: 8.6ms Render2: 15.0ms Frame: 23.6ms

The problem is that the scaling ratio is not a whole number, so in this level I end up with a 1 pixel gap every 2 or 3 tiles.

Edgar Reynaldo

Download my binaries and try ex_draw_bitmap after running RunA525Examples.bat from a command line.

https://bitbucket.org/bugsquasher/unofficial-allegro-5-binaries/downloads/

Andrew Gillett

I get 60fps +- 2, varies between around 1200 and 2500 / sec.

With 1024 sprites, 29fps +- 0, 140-230/sec.

Edgar Reynaldo

Ok, second question. Are you running on an integrated GPU or a dedicated card. What are your CPU specs and GPU model(s). Have you updated your drivers since purchasing the card?

EDIT
Also, those numbers are pretty bad. Have you exceeded the maximum texture size and inadvertently produced a memory bitmap? How old is your gpu?

Andrew Gillett

Intel Core i5-3570K with integrated GPU, latest drivers
https://ark.intel.com/content/www/us/en/ark/products/65520/intel-core-i5-3570k-processor-6m-cache-up-to-3-80-ghz.html

As mentioned previously, I put in some code to confirm that none of the bitmaps being drawn to are memory bitmaps.

but...

I just tried ALLEGRO_NO_PRESERVE_TEXTURE, having tried it a while back, and now I get vastly better Render2 performance.

Render1: 8.5ms Render2: 0.5ms Frame: 9.0ms
Render1: 8.6ms Render2: 0.6ms Frame: 9.2ms
Render1: 9.3ms Render2: 0.7ms Frame: 9.9ms
Render1: 8.3ms Render2: 0.7ms Frame: 9.0ms
Render1: 8.6ms Render2: 0.6ms Frame: 9.3ms
Render1: 8.7ms Render2: 0.6ms Frame: 9.3ms
Render1: 8.5ms Render2: 0.5ms Frame: 9.1ms

Edgar Reynaldo

Those numbers are much better. With the D3D driver, allegro automatically tries to back up textures, which is very slow at times. Try the OpenGL driver and see what your numbers are like.

Chris Katko

95% of all problems would be solved by posting the project file instead of just guessing into a black box.

Andrew Gillett

I have uploaded a project here:

https://drive.google.com/file/d/106m6ZxFLGTzWhnULdlzYuJM172r8dgkp/view?usp=sharing

What I've found is that for this test, performance is much worse in OpenGL, as compared to Direct3D with ALLEGRO_NO_PRESERVE_TEXTURE set on the backbuffer. The Render1 step which draws lots of small sprites takes about 24ms, compared with 8 for Direct3D.

If you want to try it, the key lines to look at are:

Psector.cpp 15, 441, 469, 499
Sprites.cpp 20-23, 76, 159

Chris Katko

I installed visual studio 2019 and loaded your project and got this error

C:\Users----\Downloads\psector.ultrasimplifiedprofiling\psector.vcxproj : error  : Error HRESULT E_FAIL has been returned from a call to a COM component.

sigh... why does this stuff never work the first time. ::)

Quote:

</entry>
<entry>
<record>402</record>
<time>2020/09/08 11:35:07.956</time>
<type>Information</type>
<source>VisualStudio</source>
<description>Begin package load [Visual C++ Package]</description>
<guid>{1B027A40-8F43-11D0-8D11-00A0C91BC942}</guid>
</entry>
<entry>
<record>403</record>
<time>2020/09/08 11:35:07.957</time>
<type>Error</type>
<source>VisualStudio</source>
<description>LegacySitePackage failed for package [Visual C++ Package]Source: &apos;mscorlib&apos; Description: ValueFactory attempted to access the Value property of this instance.&#x000D;&#x000A;System.InvalidOperationException: ValueFactory attempted to access the Value property of this instance.&#x000D;&#x000A; at System.Lazy`1.CreateValue()&#x000D;&#x000A;--- End of stack trace from previous location where exception was thrown ---&#x000D;&#x000A; at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()&#x000D;&#x000A; at System.Lazy`1.get_Value()&#x000D;&#x000A; at Microsoft.VisualStudio.VC.CppSvc.get_IVCPreferences()&#x000D;&#x000A; at Microsoft.VisualStudio.VC.ManagedInterop.Initialize(IServiceProvider serviceProvider)</description>
<guid>{1B027A40-8F43-11D0-8D11-00A0C91BC942}</guid>
<hr>80131509</hr>
<errorinfo></errorinfo>
</entry>
<entry>
<record>404</record>
<time>2020/09/08 11:35:07.959</time>
<type>Error</type>
<source>VisualStudio</source>
<description>SetSite failed for package [Visual C++ Package](null)</description>
<guid>{1B027A40-8F43-11D0-8D11-00A0C91BC942}</guid>
<hr>80131509</hr>
<errorinfo></errorinfo>
</entry>
<entry>
<record>405</record>
<time>2020/09/08 11:35:07.961</time>
<type>Error</type>
<source>VisualStudio</source>
<description>End package load [Visual C++ Package]</description>
<guid>{1B027A40-8F43-11D0-8D11-00A0C91BC942}</guid>
<hr>80131509</hr>
<errorinfo></errorinfo>
</entry>
<entry>
<record>406</record>
<time>2020/09/08 11:35:07.964</time>
<type>Information</type>
<source>VisualStudio</source>
<description>Begin package load [Visual C++ Package]</description>
<guid>{1B027A40-8F43-11D0-8D11-00A0C91BC942}</guid>
</entry>
<entry>
<record>407</record>
<time>2020/09/08 11:35:07.964</time>
<type>Error</type>
<source>VisualStudio</source>
<description>LegacySitePackage failed for package [Visual C++ Package]Source: &apos;mscorlib&apos; Description: ValueFactory attempted to access the Value property of this instance.&#x000D;&#x000A;System.InvalidOperationException: ValueFactory attempted to access the Value property of this instance.&#x000D;&#x000A; at System.Lazy`1.CreateValue()&#x000D;&#x000A;--- End of stack trace from previous location where exception was thrown ---&#x000D;&#x000A; at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()&#x000D;&#x000A; at System.Lazy`1.get_Value()&#x000D;&#x000A; at Microsoft.VisualStudio.VC.CppSvc.get_IVCPreferences()&#x000D;&#x000A; at Microsoft.VisualStudio.VC.ManagedInterop.Initialize(IServiceProvider serviceProvider)</description>
<guid>{1B027A40-8F43-11D0-8D11-00A0C91BC942}</guid>
<hr>80131509</hr>
<errorinfo></errorinfo>
</entry>
</activity>

... C++ didn't install right maybe? ??? I have no idea why I'm a magnet for such obscure errors.

[edit] hmmm........ maybe a combination of visual studio "wanting to restart to finish" + windows feature update applying on restart clobbered a dependency. I bet that's it.

GG Microsoft.

[edit] it compiles now.

Andrew Gillett

Which version of MSVC 2019 are you on?

Here they suggest updating and deleting the .vs folder
https://developercommunity.visualstudio.com/content/problem/526997/vs2019-update-to-1601-broke-something.html

Chris Katko

[edit] nvm it's a user pragma

Well this looks wrong. I sure as heck don't have a "23bit" or "13bit" color screen.

#SelectExpand
1The thread 0x3dec has exited with code 0 (0x0). 20: 640x480, 23bit, 60Hz 30: 640x480, 23bit, 72Hz 40: 640x480, 23bit, 75Hz 50: 720x480, 23bit, 60Hz 60: 720x480, 23bit, 59Hz 70: 720x576, 23bit, 50Hz 80: 800x600, 23bit, 60Hz 90: 800x600, 23bit, 72Hz 100: 800x600, 23bit, 75Hz 110: 1024x768, 23bit, 60Hz 120: 1024x768, 23bit, 70Hz 130: 1024x768, 23bit, 75Hz 140: 1152x864, 23bit, 75Hz 150: 1176x664, 23bit, 50Hz 160: 1176x664, 23bit, 60Hz 170: 1176x664, 23bit, 59Hz 180: 1280x720, 23bit, 60Hz 190: 1280x720, 23bit, 59Hz 200: 1280x720, 23bit, 50Hz 210: 1280x768, 23bit, 60Hz 220: 1280x800, 23bit, 60Hz 230: 1280x960, 23bit, 60Hz 240: 1280x960, 23bit, 75Hz 250: 1280x1024, 23bit, 60Hz 260: 1280x1024, 23bit, 75Hz 270: 1360x768, 23bit, 60Hz 280: 1366x768, 23bit, 60Hz 290: 1600x900, 23bit, 60Hz 300: 1600x1024, 23bit, 59Hz 310: 1600x1024, 23bit, 60Hz 320: 1600x1200, 23bit, 60Hz 330: 1600x1200, 23bit, 59Hz 340: 1600x1200, 23bit, 24Hz 350: 1600x1200, 23bit, 23Hz 360: 1600x1200, 23bit, 25Hz 370: 1600x1200, 23bit, 30Hz 380: 1600x1200, 23bit, 29Hz 390: 1600x1200, 23bit, 50Hz 400: 1680x1050, 23bit, 59Hz 410: 1680x1050, 23bit, 60Hz 420: 1768x992, 23bit, 25Hz 430: 1768x992, 23bit, 30Hz 440: 1768x992, 23bit, 29Hz 450: 1920x1080, 23bit, 60Hz 460: 1920x1080, 23bit, 59Hz 470: 1920x1080, 23bit, 50Hz 480: 1920x1080, 23bit, 30Hz 490: 1920x1080, 23bit, 29Hz 500: 1920x1080, 23bit, 25Hz 510: 1920x1080, 23bit, 24Hz 520: 1920x1080, 23bit, 23Hz 530: 1920x1200, 23bit, 60Hz 540: 1920x1200, 23bit, 59Hz 550: 1920x1200, 23bit, 24Hz 560: 1920x1200, 23bit, 23Hz 570: 1920x1200, 23bit, 25Hz 580: 1920x1200, 23bit, 30Hz 590: 1920x1200, 23bit, 29Hz 600: 1920x1200, 23bit, 50Hz 610: 1920x1440, 23bit, 60Hz 620: 1920x1440, 23bit, 59Hz 630: 1920x1440, 23bit, 24Hz 640: 1920x1440, 23bit, 23Hz 650: 1920x1440, 23bit, 25Hz 660: 1920x1440, 23bit, 30Hz 670: 1920x1440, 23bit, 29Hz 680: 1920x1440, 23bit, 50Hz 690: 2048x1536, 23bit, 60Hz 700: 2048x1536, 23bit, 59Hz 710: 2048x1536, 23bit, 24Hz 720: 2048x1536, 23bit, 23Hz 730: 2048x1536, 23bit, 25Hz 740: 2048x1536, 23bit, 30Hz 750: 2048x1536, 23bit, 29Hz 760: 2048x1536, 23bit, 50Hz 770: 2560x1440, 23bit, 60Hz 780: 2560x1440, 23bit, 59Hz 790: 2560x1440, 23bit, 24Hz 800: 2560x1440, 23bit, 23Hz 810: 2560x1440, 23bit, 25Hz 820: 2560x1440, 23bit, 30Hz 830: 2560x1440, 23bit, 29Hz 840: 2560x1440, 23bit, 50Hz 850: 2560x1600, 23bit, 60Hz 860: 2560x1600, 23bit, 59Hz 870: 2560x1600, 23bit, 24Hz 880: 2560x1600, 23bit, 23Hz 890: 2560x1600, 23bit, 25Hz 900: 2560x1600, 23bit, 30Hz 910: 2560x1600, 23bit, 29Hz 920: 2560x1600, 23bit, 50Hz 930: 1440x900, 23bit, 60Hz 940: 1400x1050, 23bit, 59Hz 950: 1400x1050, 23bit, 60Hz 960: 3840x2160, 23bit, 60Hz 970: 3840x2160, 23bit, 59Hz 980: 3840x2160, 23bit, 24Hz 990: 3840x2160, 23bit, 23Hz 1000: 3840x2160, 23bit, 25Hz 1010: 3840x2160, 23bit, 30Hz 1020: 3840x2160, 23bit, 29Hz 1030: 3840x2160, 23bit, 50Hz 1040: 4096x2160, 23bit, 50Hz 1050: 4096x2160, 23bit, 60Hz 1060: 4096x2160, 23bit, 59Hz 1070: 4096x2160, 23bit, 24Hz 1080: 4096x2160, 23bit, 23Hz 1090: 4096x2160, 23bit, 25Hz 1100: 4096x2160, 23bit, 30Hz 1110: 4096x2160, 23bit, 29Hz 1120: 640x480, 13bit, 60Hz 1130: 640x480, 13bit, 72Hz 1140: 640x480, 13bit, 75Hz 1150: 720x480, 13bit, 60Hz 1160: 720x480, 13bit, 59Hz 1170: 720x576, 13bit, 50Hz 1180: 800x600, 13bit, 60Hz 1190: 800x600, 13bit, 72Hz 1200: 800x600, 13bit, 75Hz 1210: 1024x768, 13bit, 60Hz 1220: 1024x768, 13bit, 70Hz 1230: 1024x768, 13bit, 75Hz 1240: 1152x864, 13bit, 75Hz 1250: 1176x664, 13bit, 50Hz 1260: 1176x664, 13bit, 60Hz 1270: 1176x664, 13bit, 59Hz 1280: 1280x720, 13bit, 60Hz 1290: 1280x720, 13bit, 59Hz 1300: 1280x720, 13bit, 50Hz 1310: 1280x768, 13bit, 60Hz 1320: 1280x800, 13bit, 60Hz 1330: 1280x960, 13bit, 60Hz 1340: 1280x960, 13bit, 75Hz 1350: 1280x1024, 13bit, 60Hz 1360: 1280x1024, 13bit, 75Hz 1370: 1360x768, 13bit, 60Hz 1380: 1366x768, 13bit, 60Hz 1390: 1600x900, 13bit, 60Hz 1400: 1600x1024, 13bit, 59Hz 1410: 1600x1024, 13bit, 60Hz 1420: 1600x1200, 13bit, 60Hz 1430: 1600x1200, 13bit, 59Hz 1440: 1600x1200, 13bit, 24Hz 1450: 1600x1200, 13bit, 23Hz 1460: 1600x1200, 13bit, 25Hz 1470: 1600x1200, 13bit, 30Hz 1480: 1600x1200, 13bit, 29Hz 1490: 1600x1200, 13bit, 50Hz 1500: 1680x1050, 13bit, 59Hz 1510: 1680x1050, 13bit, 60Hz 1520: 1768x992, 13bit, 25Hz 1530: 1768x992, 13bit, 30Hz 1540: 1768x992, 13bit, 29Hz 1550: 1920x1080, 13bit, 60Hz 1560: 1920x1080, 13bit, 59Hz 1570: 1920x1080, 13bit, 50Hz 1580: 1920x1080, 13bit, 30Hz 1590: 1920x1080, 13bit, 29Hz 1600: 1920x1080, 13bit, 25Hz 1610: 1920x1080, 13bit, 24Hz 1620: 1920x1080, 13bit, 23Hz 1630: 1920x1200, 13bit, 60Hz 1640: 1920x1200, 13bit, 59Hz 1650: 1920x1200, 13bit, 24Hz 1660: 1920x1200, 13bit, 23Hz 1670: 1920x1200, 13bit, 25Hz 1680: 1920x1200, 13bit, 30Hz 1690: 1920x1200, 13bit, 29Hz 1700: 1920x1200, 13bit, 50Hz 1710: 1920x1440, 13bit, 60Hz 1720: 1920x1440, 13bit, 59Hz 1730: 1920x1440, 13bit, 24Hz 1740: 1920x1440, 13bit, 23Hz 1750: 1920x1440, 13bit, 25Hz 1760: 1920x1440, 13bit, 30Hz 1770: 1920x1440, 13bit, 29Hz 1780: 1920x1440, 13bit, 50Hz 1790: 2048x1536, 13bit, 60Hz 1800: 2048x1536, 13bit, 59Hz 1810: 2048x1536, 13bit, 24Hz 1820: 2048x1536, 13bit, 23Hz 1830: 2048x1536, 13bit, 25Hz 1840: 2048x1536, 13bit, 30Hz 1850: 2048x1536, 13bit, 29Hz 1860: 2048x1536, 13bit, 50Hz 1870: 2560x1440, 13bit, 60Hz 1880: 2560x1440, 13bit, 59Hz 1890: 2560x1440, 13bit, 24Hz 1900: 2560x1440, 13bit, 23Hz 1910: 2560x1440, 13bit, 25Hz 1920: 2560x1440, 13bit, 30Hz 1930: 2560x1440, 13bit, 29Hz 1940: 2560x1440, 13bit, 50Hz 1950: 2560x1600, 13bit, 60Hz 1960: 2560x1600, 13bit, 59Hz 1970: 2560x1600, 13bit, 24Hz 1980: 2560x1600, 13bit, 23Hz 1990: 2560x1600, 13bit, 25Hz 2000: 2560x1600, 13bit, 30Hz 2010: 2560x1600, 13bit, 29Hz 2020: 2560x1600, 13bit, 50Hz 2030: 1440x900, 13bit, 60Hz 2040: 1400x1050, 13bit, 59Hz 2050: 1400x1050, 13bit, 60Hz 2060: 3840x2160, 13bit, 60Hz 2070: 3840x2160, 13bit, 59Hz 2080: 3840x2160, 13bit, 24Hz 2090: 3840x2160, 13bit, 23Hz 2100: 3840x2160, 13bit, 25Hz 2110: 3840x2160, 13bit, 30Hz 2120: 3840x2160, 13bit, 29Hz 2130: 3840x2160, 13bit, 50Hz 2140: 4096x2160, 13bit, 50Hz 2150: 4096x2160, 13bit, 60Hz 2160: 4096x2160, 13bit, 59Hz 2170: 4096x2160, 13bit, 24Hz 2180: 4096x2160, 13bit, 23Hz 2190: 4096x2160, 13bit, 25Hz 2200: 4096x2160, 13bit, 30Hz 2210: 4096x2160, 13bit, 29Hz 2220: Config screen resolution is invalid - trying desktop res (fullscreen) or first resolution below desktop (windowed) 2230: Changing video mode to 1400x1050, windowed, 32 bit colour

Andrew Gillett

That's fine, it's just a reminder to me that I need to update that code at some point.

Chris Katko

One thing I notice is

called for every sprite. That may incur a cost (and the driver update may have reduced that cost). If the target is always a render buffer, or the screen, etc, you should only be calling that once [per actual need] as the driver may be using the moving [current target] to decide when to cache / move memory around. (complete guess but it's definitely different than I've ever coded my projects)

Andrew Gillett

I mentioned in a previous post:
"Something strange I noticed is that when I request a list of graphics modes in Allegro 5, all the modes are reported as having 23 bit colour depth. However, when I print the actual bitmap colour depths, they are all 32."

This is the code which prints the modes:

#SelectExpand
1for (int i = 0; i < al_get_num_display_modes(); ++i) 2{ 3 ALLEGRO_DISPLAY_MODE mode; 4 al_get_display_mode(i, &mode); 5 Log("%dx%d, %dbit, %dHz\n", mode.width, mode.height, mode.format, mode.refresh_rate); 6}

Chris Katko

profile shows

7% of time was spent just in al_set_target() on my machine
14% on al_draw_bitmap

though i'm not used to this MSVC UI for profiling

[edit]

24% was in waiting for events, so it's either vsync or set to only draw a set rate.

[edit]

I set al_set_new_display_option(ALLEGRO_VSYNC, 2, ALLEGRO_REQUIRE); //vsync off

600: Advance: 0.0ms  Render1: 7.0ms  Render2: 0.5ms  Frame: 7.4ms
600: Advance: 0.0ms  Render1: 7.5ms  Render2: 0.5ms  Frame: 8.0ms
600: Advance: 0.0ms  Render1: 7.9ms  Render2: 0.3ms  Frame: 8.3ms
600: Advance: 0.0ms  Render1: 9.9ms  Render2: 0.3ms  Frame: 10.3ms
600: Advance: 0.0ms  Render1: 7.3ms  Render2: 0.3ms  Frame: 7.6ms
600: Advance: 0.0ms  Render1: 9.5ms  Render2: 0.3ms  Frame: 9.8ms
600: Advance: 0.0ms  Render1: 7.3ms  Render2: 0.3ms  Frame: 7.7ms
600: Advance: 0.0ms  Render1: 9.5ms  Render2: 0.5ms  Frame: 10.0ms
600: Advance: 0.0ms  Render1: 7.9ms  Render2: 0.4ms  Frame: 8.2ms
600: Advance: 0.0ms  Render1: 7.7ms  Render2: 0.3ms  Frame: 8.0ms

I'm getting over 100 FPS even with al_set_target_bitmap (still uses 8% cpu time!). Granted, it's a GTX 1060.

Still, it's showing plenty of time taken in the event queue so maybe I'll need to force no vsync with my drivers. That, or you did something wrong with the event queue/timers and it's forcing it to wait.

[edit] Still blowing 25% time waiting for events with vsync forced off. Take a look through in your event handler / timing code when you can, I'm out of time for the moment.

#SelectExpand
1//25% in this block 2//psector.cpp:354 3 while (!al_is_event_queue_empty(eventQueue)) 4 { 5 ALLEGRO_EVENT event; 6 7 al_wait_for_event(eventQueue, &event); 8 9 switch (event.type) 10 { 11 case ALLEGRO_EVENT_DISPLAY_CLOSE: 12 quit = true; 13 break; 14 case ALLEGRO_EVENT_DISPLAY_LOST: 15 Log("ALLEGRO_EVENT_DISPLAY_LOST\n"); 16 break; 17 case ALLEGRO_EVENT_DISPLAY_HALT_DRAWING: 18 Log("ALLEGRO_EVENT_DISPLAY_HALT_DRAWING\n"); 19 break; 20 case ALLEGRO_EVENT_DISPLAY_RESUME_DRAWING: 21 Log("ALLEGRO_EVENT_DISPLAY_RESUME_DRAWING\n"); 22 break; 23 case ALLEGRO_EVENT_TIMER: 24 ++timer_value; 25 ++totalTicks; 26 break; 27 } 28 }

You know, the second I look at that, it looks completely normal. I was expecting something more... squirrelly.

[edit]

Wait, no... how... if you're waiting for Allegro to fire off an event, it's literally impossible to exceed that speed. I don't think that's how you'd setup timing for a benchmark.

Here you set a timer based on the refresh rate of the screen:

//psector.cpp:232
  alTimer = al_create_timer(1.0 / update_rate);
  al_register_event_source(eventQueue, al_get_timer_event_source(alTimer));
  al_start_timer(alTimer);

and that's the only thing triggering the exit from the while loop above.

p.s. profilers rule. 8-)

Peter Hull

Great work CK!

Just one thing,

Well this looks wrong. I sure as heck don't have a "23bit" or "13bit" color screen.

The format member is not a bit depth it's a pixel format, one of these. So, 13 is ALLEGRO_PIXEL_FORMAT_RGB_565 and 23 is ALLEGRO_PIXEL_FORMAT_XRGB_8888. Hopefully that makes more sense.

Andrew Gillett

The game runs at a maximum of 60fps. The timings I've been looking at are those measured by my own timing system, which measures the render1 and render2 times and outputs them once the level has been running for 10 seconds. You're right in saying that this system isn't ideal for profiling using the MSVC profiler. Commenting out "if (timer_value > 0)" and "if (timer_value == 0 || ++framesSkipped == 4)" should allow it to run flat-out.

UPDATE:
I added the optimisation to only call al_set_target_bitmap when it needs to change. I see a boost to Render1 performance but strangely, al_flip_display consistently takes longer. I have split it out from the Render2 timing, which confirms that all the Render2 time is taken up by al_flip_display.

Direct3D
Without targetBitmap optimisation
	Render1: 9.3ms  Render2: 0.0ms  Flip: 0.5ms Frame: 9.8ms
	Render1: 9.2ms  Render2: 0.0ms  Flip: 0.6ms Frame: 9.9ms
	Render1: 9.3ms  Render2: 0.0ms  Flip: 0.7ms Frame: 9.9ms
	Render1: 8.9ms  Render2: 0.0ms  Flip: 0.5ms Frame: 9.5ms
	Render1: 9.0ms  Render2: 0.0ms  Flip: 0.5ms Frame: 9.5ms

With targetBitmap optimisation
	Render1: 6.5ms  Render2: 0.0ms  Flip: 0.9ms Frame: 7.5ms
	Render1: 6.9ms  Render2: 0.0ms  Flip: 1.0ms Frame: 7.9ms
	Render1: 6.9ms  Render2: 0.0ms  Flip: 0.9ms Frame: 7.9ms
	Render1: 7.4ms  Render2: 0.0ms  Flip: 1.0ms Frame: 8.4ms
	Render1: 6.7ms  Render2: 0.0ms  Flip: 1.0ms Frame: 7.7ms

I also added an option for it to run flat out, without waiting for a timer event. When I enable that, I get these results:

With RUN_FLAT_OUT and targetBitmap optimisation
	Render1: 7.9ms  Render2: 0.0ms  Flip: 8.0ms Frame: 16.0ms
	Render1: 7.6ms  Render2: 0.0ms  Flip: 8.0ms Frame: 15.7ms
	Render1: 7.4ms  Render2: 0.0ms  Flip: 7.7ms Frame: 15.2ms
	Render1: 7.4ms  Render2: 0.0ms  Flip: 7.3ms Frame: 14.7ms
	Render1: 6.0ms  Render2: 0.0ms  Flip: 8.7ms Frame: 14.8ms

This is with vsync off using ALLEGRO_REQUIRE, although as you say the driver does not necessarily honour that.

Updated project:
https://drive.google.com/file/d/1RfEbTlI3yn2lzxaClDGe9Sb3j0ZYBPO9/view?usp=sharing

See these lines in updated project:
#define RUN_FLAT_OUT 0 (psector.cpp)
#define SET_TARGET_BITMAP_OPTIMISATION 1 (sprites.cpp)

Chris Katko

I added the optimisation to only call al_set_target_bitmap when it needs to change. I see a boost to Render1 performance but strangely, al_flip_display consistently takes longer. I have split it out from the Render2 timing, which confirms that all the Render2 time is taken up by al_flip_display.

That does still does seem like vsync (or another hard limiter). So the question is now, isn't so much performance but "is there something wrong in your code, or does everyone else have the same FPS limit when using Allegro in Windows?"

Also, do you have Intel, AMD, or nVidia hardware?

Thanks,
--Chris

[edit] I dev in Linux and I've never had these problems so I'm guessing it's a Windows only issue unless there's another squirrelly timing issue hidden.

Andrew Gillett

Intel Core i5-3570K with integrated Intel HD Graphics 4000 GPU

Chris Katko

Is it a laptop, with some sort of power saving mode?

This is with vsync off using ALLEGRO_REQUIRE, although as you say the driver does not necessarily honour that.

Oddly enough, with REQUIRE, it's supposed to fail/crash if it can't force it. But who knows if that actually works.

Andrew Gillett

No, it's a desktop.

Edgar Reynaldo

Allegro can't always override the driver's decision about VSYNC. You need to disable VSYNC in your driver for your application to be sure. ALLEGRO_VSYNC is just a suggestion, even with ALLEGRO_REQUIRE, because the driver is in control.

16ms is 60HZ, which indicates VSYNC.

Thread #618223. Printed from Allegro.cc