I've been developing a game using Allegro 4, recently I tried switched to Allegro 5 due to some issues which I'll write about in a separate post. However, the performance is much, much worse. In Allegro 4 I was writing all graphics to a memory bitmap, then using AllegroGL to scale the backbuffer to the screen. In Allegro 5 I'm using al_draw_scaled_bitmap to display the backbuffer. With memory bitmaps it's incredibly slow, but even with video bitmaps it's far too slow.
Comparison of profiling results (render1 = write sprites to backbuffer, render2 = scale backbuffer to screen)
(This is on a level where 2000 sprites are being displayed per frame)
Allegro 4:
Render1: 12ms, Render2: 10ms
Allegro 5, new bitmap flags = ALLEGRO_ALPHA_TEST
Render1: 30.9ms Render2: 13.6ms
As above but also with ALLEGRO_NO_PRESERVE_TEXTURE
Render1: 71.4ms Render2: 10.4ms
With ALLEGRO_OPENGL, but without ALLEGRO_NO_PRESERVE_TEXTURE
Render1: 95.9ms Render2: 15.3ms
Turned off alpha channel (will be ok for most sprites):
Render1: 28.9ms Render2: 15.1ms
Attached is a screenshot of the level. This level is much bigger than the standard one, but even on smaller levels the performance is terrible.
The tile sprites, which make up the majority of the sprites in this image, are 32x32.
This sounds symptomatic of memory bitmaps being used elsewhere (ie. as part of your pipeline - not with texture preservation). Allegro 5 isn't good at memory bitmap stuff
So - are you using memory bitmaps anywhere in your Allegro 5 code when you say "even with video bitmaps it's far too slow"?
There are only 2 calls to al_create_bitmap and 1 call to al_load_bitmap in the entire codebase, and al_set_new_bitmap_flags is called before each of them. If I call al_get_bitmap_flags before each draw, I get 0x410 (ALLEGRO_VIDEO_BITMAP | ALLEGRO_ALPHA_TEST) for most of them, but the display gets 0x400 (ALLEGRO_VIDEO_BITMAP).
It's likely you're doing something wrong. Weird drawing code. Locking bitmaps (which forces copying to memory). Etc.
I get easily 100x performance with Allegro 5 with a similar tiled bitmap game on a tiny netbook.
Is this Linux, Windows, or Mac OS X?
I believe you have likely done some testing but just in case this can help - if you are working with tilemaps and Allegro5 then a good solution to speed things up (considerably) would be to modify your tilemap drawing code to use vertex buffer objects (you can check a non-optimized version of this here https://github.com/mikiZX/Allegro5-2d-Platformer-Demo-using-VBOs-and-Tiled-tilemaps ).
You could also optimize this (and likely your actual code) by segmenting your tilemap in smaller sections and only drawing the ones which are actually visible on the screen (sort of using many 32x32 tiles segments) - in case you are drawing all of them each frame and only part is visible.
As suggested, with Allegro5 there are drawing operations on video bitmaps that require locking of the bitmap before the operation is performed (like i.e. al_put_pixel) - if you use al_put_pixel multiple times without locking the bitmap this could slow things down a lot.
I am on Windows. I am not calling al_lock_bitmap, and all put/get_pixel calls are commented out.
I have done some more tests, and found that even when nothing was being drawn, a single call to OutputDebugString (Microsoft's function for writing to Visual Studio's debug output) caused the enclosing profiling region to randomly vary between low numbers (eg 0.5ms, still not great for a single debug print but not out of the ordinary for that function) and very high numbers, sometimes over 10ms - see attached image.
Problem solved? I re-enabled the drawing code and commented out OutputDebugString, but performance was no better - still in the region of 40-80ms per frame. Both drawing the sprites to the backbuffer, and drawing the backbuffer to the screen, vary by up to a factor of 2, although the latter is more stable.
Does it matter that the backbuffer does not have power of 2 width/height?
For sure the Allegro wizards that follow this forum will be able to provide a definite answer but I've searched Allegro's GitHub repository for 'power-of-two' and 'pot' and it says this (seems to be the case for both OpenGL and DirectX):
"* Also, we do support old OpenGL drivers where textures must have
* power-of-two dimensions. If a non-power-of-two bitmap is created in
* such a case, we use a texture with the next larger POT dimensions,
* and just keep some unused padding space to the right/bottom of the
* pixel data."
So it appears this is handled by Allegro internally so should not be the issue here.
I would try profiling my code and pinpoint exact section of the code that runs slow? Likely there are better ways - but a noob like me - I would keep a global variable and store system time in it ever so often throughout your code and each time just before you store a new system time value in it check out the difference between actual time and the variable itself - if the value is large enough then print out a debug message.
In Allegro 5 you just don't use memory bitmaps, hardly ever.
The A5 way is drawing from a tile atlas, you have like 22 unique tiles on that screen at 32x32, that's 2x11=64x352 which would easily fit in an atlas that was 512x512 or better.
And of course OutputDebugString will slow your program down writing to the console like that. It's much faster to write to a file.
EDIT
Show game loop and rendering code.
The A5 way is drawing from a tile atlas
What's the difference between using a 512x512 atlas (together with sub-bitmaps, I assume) and using 22 different source bitmaps? Does it affect performance?.
Also, if I remember correctly, sub-bitmaps doesn't support tiling so, when you need tiling, you can't use them.
Show game loop and rendering code
I second this, could be useful for doing a benchmark on different systems. I've noticed that Allegro 5 performance drops when drawing lots of bitmaps but I always blamed it on having an old graphics card and using open source drivers. I've never done any serious benchmark but I've noticed CPU usage increasing considerably when drawing lots of bitmaps, which always seemed a bit off to me considering that Allegro 5 is supposed to do its drawing on the GPU but, as I said, I've never done any serious testing and just considered using the low level primitives in cases when you need to draw lots of bitmaps.
What's the difference between using a 512x512 atlas (together with sub-bitmaps, I assume) and using 22 different source bitmaps? Does it affect performance?
Using 22 separate bitmaps probably won't be an issue, but once you start getting into the hundreds, I'd imagine you'd start seeing a serious dip in performance (or increased memory usage). I'm being quite unscientific here though; I'm just aware that it's better to have fewer discrete textures.
I've never done any serious testing and just considered using the low level primitives in cases when you need to draw lots of bitmaps.
Yes indeed - it'll be even faster if you're using vertex buffers.
What's the difference between using a 512x512 atlas (together with sub-bitmaps, I assume) and using 22 different source bitmaps? Does it affect performance?.
Yes! IIRC, drawing from different source bitmaps is the equivalent of several OpenGL texture changes. This will slow down drawing and should be avoided if possible (e.g. use an atlas with sub bitmaps, or even sort by source bitmap if possible).
So it's not about the number of source bitmaps but the number of switches between them. IIRC, this should be adressed before attempting to use vertex buffers.
The GPU is happiest if you setup "state" once and then only send geometry or alter matrices.
I've never done any serious benchmark but I've noticed CPU usage increasing considerably when drawing lots of bitmaps, which always seemed a bit off to me considering that Allegro 5 is supposed to do its drawing on the GPU
I doubt this would be due to your graphics card (unless it is over maybe 15 years old). The likely reason the CPU usage would increase is due to 'feeding' the GPU with the data it needs to draw - sort of 'bottlenecking' the GPU-CPU data transfer on the CPU side as CPU will not be able to feed the data to the GPU at the rate at which the GPU can draw. This is I believe the main reason one would like to have the CPU send data only once to the GPU (using VBO) as opposed to sending the data many times (once per bitmap you wish to draw). From my experience (not very extensive, mind you) it is faster to give the CPU task of re-creating the VBO each frame and then drawing that than actually sending each bitmap individually. The atlas idea is used here as one VBO draw call will be bound to only one bitmap(or a single set of bitmaps if a special shader is used) - if you want to draw different bitmaps with VBO you will need to create one VBO per bitmap you will use (and this will again increase the amount of cpu-gpu talk). Thus packing all your bitmaps into a single one and having only one VBO draw call will/should be the fastest way.
Also, what is said above about the context changes is also a factor, one mediated again with using a single vbo.
As for tiling, this can be achieved with VBO though it would require adding additional geometry to simulate the tiling effect (as opposed to simply increasing the UV coordinates which I believe is what one would normally do - if I understand your question correctly).
If you don't need tiling, atlas'es and sub bitmaps are the way to go.
If you need tiling, you need either a single texture one texture per tile, or a shader.
Well, that's some interesting info on VBOs which is not immediately obvious when using Allegro 5, in particular if you come from using Allegro 4 in the past. I always thought 'hardware accelerated' meant the graphics card would take care of all the drawing with little to no load on the CPU; I also assumed that loading a bitmap as video bitmap meant it was loaded as a texture on video memory, from where the graphics card could access it and draw it with no load on the CPU either, even when using the traditional Allegro 4 way of drawing things, but apparently it is passing the coordinates/geometry to the GPU what can slow down the process? This requires a whole new way of thinking about the drawing process in Allegro 5.
1. Can you just post the whole project so we can profile it?
2. What are your hardware specs?
I have no problems holding 60 FPS... on a netbook... with an i3 celeron and intel HD graphics while drawing multiple tile layers (floors, walls, "decals", "dirt", and lighting layers) in 1366x768.
I don't use VBOs or display lists or anything "fancy".
It's even possible your measuring wrong. If you're running Windows, use one of those nVidia (or Windows [Windows Key + G] or whatever) FPS meters. Linux might have an equivalent.
Also, what's your video mode? You're not in some kind of creepy 24-bit mode? (somehow different than your images in memory, forcing a color conversion every frame.)
Letting other people compile and profile your project will eliminate many of these variables.
I've done some more profiling and I'm getting better results than I had been getting before – so it's possible there was something else going on like a rogue logging call somewhere.
Here are some profiling results from a simplified test where it draws around 2000 32x32 sprites per frame. Render1 is drawing the sprites and Render2 is scaling the backbuffer to the screen. In this test, the screen resolution is 1680x1050 and the back buffer is 1888x1120 (although most levels will be considerably smaller than this).
Allegro 5
Render1: 10.0ms Render2: 18.8ms Frame: 28.8ms
Render1: 9.4ms Render2: 20.9ms Frame: 30.3ms
Render1: 10.0ms Render2: 20.3ms Frame: 30.3ms
Render1: 8.8ms Render2: 23.8ms Frame: 32.6ms
Render1: 8.6ms Render2: 16.9ms Frame: 25.5ms
Render1: 9.9ms Render2: 18.9ms Frame: 28.8ms
Render1: 9.7ms Render2: 20.8ms Frame: 30.6ms
Allegro 4 (Render2 uses allegro_gl_make_texture_ex and draws a quad, Render1 uses draw_rle_sprite)
Render1: 13.6ms Render2: 8.9ms Frame: 22.5ms
Render1: 13.6ms Render2: 8.9ms Frame: 22.5ms
Render1: 13.7ms Render2: 8.9ms Frame: 22.5ms
Render1: 13.7ms Render2: 8.9ms Frame: 22.5ms
Render1: 13.6ms Render2: 8.9ms Frame: 22.5ms
Render1: 13.6ms Render2: 8.9ms Frame: 22.5ms
Render1: 13.6ms Render2: 8.9ms Frame: 22.4ms
Render1: 13.6ms Render2: 8.9ms Frame: 22.5ms
In this case you can see that Allegro 5 is faster at drawing the small sprites but slower at scaling the back buffer to the screen as compared to the AllegroGL code I'm using in the old version. I'm going to look into the practicality of scaling the sprites directly to the screen rather than going through the additional render2 step.
The players in my game are drawn by creating a temporary bitmap, copying the relevant head and body parts to the bitmap, and then drawing that to the screen – with additional steps if lighting and/or transparency are needed. In the Allegro 4 version they are also modified in real time using get/put pixel (primarily replacing the default player colour with the desired player colour, but also sometimes for a shield effect which puts a white outline around the player). I know this kind of pixel replacement is a non-starter for Allegro 5. Even without any lighting or transparency effects, in my test the Allegro 5 version averages 8ms to draw 16 players (49 draw sprite calls), while the Allegro 4 version runs about 20% faster. Given that the pixel replacement stuff is not feasible for Allegro 5, I will have to generate and store the sprites at the start of the level, so that should help performance.
Both machines I've tested on are around six or seven years old and have Intel on-board graphics. Timing uses QueryPerformanceCounter.
Something strange I noticed is that when I request a list of graphics modes in Allegro 5, all the modes are reported as having 23 bit colour depth. However, when I print the actual bitmap colour depths, they are all 32.
Just to be absolutely sure: what frame rates are you getting? Could any of this be vsync-related? I notice that all of the 'Render2' timings are over 16.6ms (some of them very close) which is close to the frame timing of a 60Hz display with vsync: 1000ms / 60Hz = 16.6ms.
vsync is off. I just tested it on a much better PC and got these timings (window resolution was 1768x992):
Render1: 6.0ms Render2: 3.9ms Frame: 9.9ms
Render1: 6.0ms Render2: 4.0ms Frame: 10.0ms
Render1: 6.0ms Render2: 4.0ms Frame: 10.0ms
Render1: 6.0ms Render2: 3.9ms Frame: 9.9ms
Render1: 6.1ms Render2: 4.0ms Frame: 10.1ms
Render1: 6.0ms Render2: 4.0ms Frame: 10.0ms
Render1: 6.0ms Render2: 4.2ms Frame: 10.2ms
Render1: 6.2ms Render2: 3.9ms Frame: 10.2ms
Render1: 6.0ms Render2: 3.9ms Frame: 9.9ms
I'm going to look into the practicality of scaling the sprites directly to the screen rather than going through the additional render2 step
In my recent Allegro 5 project I did set up an allegro transformation so that all drawing is automatically scaled to fit the screen/window size. It seems to work pretty good for 2D.
I just tested it on a much better PC and got these timings
Is the Allegro 4 timing still better?.
Timings from a Paperspace cloud PC, which is faster than mine:
al5
Render1: 9.0ms Render2: 4.9ms Frame: 13.9ms
Render1: 9.8ms Render2: 5.1ms Frame: 14.9ms
Render1: 11.0ms Render2: 5.5ms Frame: 16.5ms
Render1: 10.9ms Render2: 4.4ms Frame: 15.3ms
Render1: 9.2ms Render2: 4.3ms Frame: 13.5ms
Render1: 8.7ms Render2: 4.7ms Frame: 13.4ms
Render1: 8.7ms Render2: 4.4ms Frame: 13.2ms
Render1: 9.8ms Render2: 4.1ms Frame: 13.9ms
Render1: 9.3ms Render2: 4.0ms Frame: 13.3ms
Render1: 9.1ms Render2: 4.1ms Frame: 13.2ms
Render1: 9.8ms Render2: 4.6ms Frame: 14.4ms
Render1: 9.8ms Render2: 5.6ms Frame: 15.4ms
Render1: 9.3ms Render2: 4.0ms Frame: 13.3ms
al4
Render1: 11.0ms Render2: 2.6ms Frame: 13.6ms
Render1: 11.0ms Render2: 2.6ms Frame: 13.6ms
Render1: 10.9ms Render2: 2.6ms Frame: 13.5ms
Render1: 10.8ms Render2: 2.5ms Frame: 13.4ms
Render1: 10.8ms Render2: 2.5ms Frame: 13.3ms
Render1: 10.8ms Render2: 2.5ms Frame: 13.2ms
Render1: 10.7ms Render2: 2.4ms Frame: 13.2ms
Render1: 10.6ms Render2: 2.4ms Frame: 13.1ms
Render1: 10.6ms Render2: 2.4ms Frame: 13.0ms
Render1: 10.5ms Render2: 2.4ms Frame: 13.0ms
EDIT: I originally wrote that the Al5 version varies a lot in terms of frame timings but I just realised that the Al4 version is showing smoothed out timings (as this text was originally written to the screen and is hard to read if the numbers are varying a lot every frame), whereas I changed this in the Al5 version.
I tried scaling sprites directly to the screen rather than using an intermediate backbuffer. These are the timings I got on my desktop - a little better than before but not much.
Render1: 8.6ms Render2: 13.9ms Frame: 22.6ms
Render1: 10.0ms Render2: 18.2ms Frame: 28.2ms
Render1: 8.4ms Render2: 22.7ms Frame: 31.1ms
Render1: 8.8ms Render2: 15.6ms Frame: 24.4ms
Render1: 11.2ms Render2: 16.8ms Frame: 28.1ms
Render1: 9.3ms Render2: 20.6ms Frame: 29.9ms
Render1: 8.6ms Render2: 15.0ms Frame: 23.6ms
The problem is that the scaling ratio is not a whole number, so in this level I end up with a 1 pixel gap every 2 or 3 tiles.
Download my binaries and try ex_draw_bitmap after running RunA525Examples.bat from a command line.
https://bitbucket.org/bugsquasher/unofficial-allegro-5-binaries/downloads/
I get 60fps +- 2, varies between around 1200 and 2500 / sec.
With 1024 sprites, 29fps +- 0, 140-230/sec.
Ok, second question. Are you running on an integrated GPU or a dedicated card. What are your CPU specs and GPU model(s). Have you updated your drivers since purchasing the card?
EDIT
Also, those numbers are pretty bad. Have you exceeded the maximum texture size and inadvertently produced a memory bitmap? How old is your gpu?
Intel Core i5-3570K with integrated GPU, latest drivers
https://ark.intel.com/content/www/us/en/ark/products/65520/intel-core-i5-3570k-processor-6m-cache-up-to-3-80-ghz.html
As mentioned previously, I put in some code to confirm that none of the bitmaps being drawn to are memory bitmaps.
but...
I just tried ALLEGRO_NO_PRESERVE_TEXTURE, having tried it a while back, and now I get vastly better Render2 performance.
Render1: 8.5ms Render2: 0.5ms Frame: 9.0ms
Render1: 8.6ms Render2: 0.6ms Frame: 9.2ms
Render1: 9.3ms Render2: 0.7ms Frame: 9.9ms
Render1: 8.3ms Render2: 0.7ms Frame: 9.0ms
Render1: 8.6ms Render2: 0.6ms Frame: 9.3ms
Render1: 8.7ms Render2: 0.6ms Frame: 9.3ms
Render1: 8.5ms Render2: 0.5ms Frame: 9.1ms
Those numbers are much better. With the D3D driver, allegro automatically tries to back up textures, which is very slow at times. Try the OpenGL driver and see what your numbers are like.
95% of all problems would be solved by posting the project file instead of just guessing into a black box.
I have uploaded a project here:
https://drive.google.com/file/d/106m6ZxFLGTzWhnULdlzYuJM172r8dgkp/view?usp=sharing
What I've found is that for this test, performance is much worse in OpenGL, as compared to Direct3D with ALLEGRO_NO_PRESERVE_TEXTURE set on the backbuffer. The Render1 step which draws lots of small sprites takes about 24ms, compared with 8 for Direct3D.
If you want to try it, the key lines to look at are:
Psector.cpp 15, 441, 469, 499
Sprites.cpp 20-23, 76, 159
I installed visual studio 2019 and loaded your project and got this error
C:\Users----\Downloads\psector.ultrasimplifiedprofiling\psector.vcxproj : error : Error HRESULT E_FAIL has been returned from a call to a COM component.
sigh... why does this stuff never work the first time.
</entry>
<entry>
<record>402</record>
<time>2020/09/08 11:35:07.956</time>
<type>Information</type>
<source>VisualStudio</source>
<description>Begin package load [Visual C++ Package]</description>
<guid>{1B027A40-8F43-11D0-8D11-00A0C91BC942}</guid>
</entry>
<entry>
<record>403</record>
<time>2020/09/08 11:35:07.957</time>
<type>Error</type>
<source>VisualStudio</source>
<description>LegacySitePackage failed for package [Visual C++ Package]Source: 'mscorlib' Description: ValueFactory attempted to access the Value property of this instance.
System.InvalidOperationException: ValueFactory attempted to access the Value property of this instance.
 at System.Lazy`1.CreateValue()
--- End of stack trace from previous location where exception was thrown ---
 at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
 at System.Lazy`1.get_Value()
 at Microsoft.VisualStudio.VC.CppSvc.get_IVCPreferences()
 at Microsoft.VisualStudio.VC.ManagedInterop.Initialize(IServiceProvider serviceProvider)</description>
<guid>{1B027A40-8F43-11D0-8D11-00A0C91BC942}</guid>
<hr>80131509</hr>
<errorinfo></errorinfo>
</entry>
<entry>
<record>404</record>
<time>2020/09/08 11:35:07.959</time>
<type>Error</type>
<source>VisualStudio</source>
<description>SetSite failed for package [Visual C++ Package](null)</description>
<guid>{1B027A40-8F43-11D0-8D11-00A0C91BC942}</guid>
<hr>80131509</hr>
<errorinfo></errorinfo>
</entry>
<entry>
<record>405</record>
<time>2020/09/08 11:35:07.961</time>
<type>Error</type>
<source>VisualStudio</source>
<description>End package load [Visual C++ Package]</description>
<guid>{1B027A40-8F43-11D0-8D11-00A0C91BC942}</guid>
<hr>80131509</hr>
<errorinfo></errorinfo>
</entry>
<entry>
<record>406</record>
<time>2020/09/08 11:35:07.964</time>
<type>Information</type>
<source>VisualStudio</source>
<description>Begin package load [Visual C++ Package]</description>
<guid>{1B027A40-8F43-11D0-8D11-00A0C91BC942}</guid>
</entry>
<entry>
<record>407</record>
<time>2020/09/08 11:35:07.964</time>
<type>Error</type>
<source>VisualStudio</source>
<description>LegacySitePackage failed for package [Visual C++ Package]Source: 'mscorlib' Description: ValueFactory attempted to access the Value property of this instance.
System.InvalidOperationException: ValueFactory attempted to access the Value property of this instance.
 at System.Lazy`1.CreateValue()
--- End of stack trace from previous location where exception was thrown ---
 at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
 at System.Lazy`1.get_Value()
 at Microsoft.VisualStudio.VC.CppSvc.get_IVCPreferences()
 at Microsoft.VisualStudio.VC.ManagedInterop.Initialize(IServiceProvider serviceProvider)</description>
<guid>{1B027A40-8F43-11D0-8D11-00A0C91BC942}</guid>
<hr>80131509</hr>
<errorinfo></errorinfo>
</entry>
</activity>
... C++ didn't install right maybe? I have no idea why I'm a magnet for such obscure errors.
[edit] hmmm........ maybe a combination of visual studio "wanting to restart to finish" + windows feature update applying on restart clobbered a dependency. I bet that's it.
GG Microsoft.
[edit] it compiles now.
Which version of MSVC 2019 are you on?
Here they suggest updating and deleting the .vs folder
https://developercommunity.visualstudio.com/content/problem/526997/vs2019-update-to-1601-broke-something.html
[edit] nvm it's a user pragma
Well this looks wrong. I sure as heck don't have a "23bit" or "13bit" color screen.
That's fine, it's just a reminder to me that I need to update that code at some point.
One thing I notice is
al_set_target_bitmap(dest);
called for every sprite. That may incur a cost (and the driver update may have reduced that cost). If the target is always a render buffer, or the screen, etc, you should only be calling that once [per actual need] as the driver may be using the moving [current target] to decide when to cache / move memory around. (complete guess but it's definitely different than I've ever coded my projects)
I mentioned in a previous post:
"Something strange I noticed is that when I request a list of graphics modes in Allegro 5, all the modes are reported as having 23 bit colour depth. However, when I print the actual bitmap colour depths, they are all 32."
This is the code which prints the modes:
profile shows
7% of time was spent just in al_set_target() on my machine
14% on al_draw_bitmap
though i'm not used to this MSVC UI for profiling
[edit]
24% was in waiting for events, so it's either vsync or set to only draw a set rate.
[edit]
I set al_set_new_display_option(ALLEGRO_VSYNC, 2, ALLEGRO_REQUIRE); //vsync off
600: Advance: 0.0ms Render1: 7.0ms Render2: 0.5ms Frame: 7.4ms 600: Advance: 0.0ms Render1: 7.5ms Render2: 0.5ms Frame: 8.0ms 600: Advance: 0.0ms Render1: 7.9ms Render2: 0.3ms Frame: 8.3ms 600: Advance: 0.0ms Render1: 9.9ms Render2: 0.3ms Frame: 10.3ms 600: Advance: 0.0ms Render1: 7.3ms Render2: 0.3ms Frame: 7.6ms 600: Advance: 0.0ms Render1: 9.5ms Render2: 0.3ms Frame: 9.8ms 600: Advance: 0.0ms Render1: 7.3ms Render2: 0.3ms Frame: 7.7ms 600: Advance: 0.0ms Render1: 9.5ms Render2: 0.5ms Frame: 10.0ms 600: Advance: 0.0ms Render1: 7.9ms Render2: 0.4ms Frame: 8.2ms 600: Advance: 0.0ms Render1: 7.7ms Render2: 0.3ms Frame: 8.0ms
I'm getting over 100 FPS even with al_set_target_bitmap (still uses 8% cpu time!). Granted, it's a GTX 1060.
Still, it's showing plenty of time taken in the event queue so maybe I'll need to force no vsync with my drivers. That, or you did something wrong with the event queue/timers and it's forcing it to wait.
[edit] Still blowing 25% time waiting for events with vsync forced off. Take a look through in your event handler / timing code when you can, I'm out of time for the moment.
You know, the second I look at that, it looks completely normal. I was expecting something more... squirrelly.
[edit]
Wait, no... how... if you're waiting for Allegro to fire off an event, it's literally impossible to exceed that speed. I don't think that's how you'd setup timing for a benchmark.
Here you set a timer based on the refresh rate of the screen:
//psector.cpp:232 alTimer = al_create_timer(1.0 / update_rate); al_register_event_source(eventQueue, al_get_timer_event_source(alTimer)); al_start_timer(alTimer);
and that's the only thing triggering the exit from the while loop above.
p.s. profilers rule.
Great work CK!
Just one thing,
Well this looks wrong. I sure as heck don't have a "23bit" or "13bit" color screen.
The format member is not a bit depth it's a pixel format, one of these. So, 13 is ALLEGRO_PIXEL_FORMAT_RGB_565 and 23 is ALLEGRO_PIXEL_FORMAT_XRGB_8888. Hopefully that makes more sense.
The game runs at a maximum of 60fps. The timings I've been looking at are those measured by my own timing system, which measures the render1 and render2 times and outputs them once the level has been running for 10 seconds. You're right in saying that this system isn't ideal for profiling using the MSVC profiler. Commenting out "if (timer_value > 0)" and "if (timer_value == 0 || ++framesSkipped == 4)" should allow it to run flat-out.
UPDATE:
I added the optimisation to only call al_set_target_bitmap when it needs to change. I see a boost to Render1 performance but strangely, al_flip_display consistently takes longer. I have split it out from the Render2 timing, which confirms that all the Render2 time is taken up by al_flip_display.
Direct3D Without targetBitmap optimisation Render1: 9.3ms Render2: 0.0ms Flip: 0.5ms Frame: 9.8ms Render1: 9.2ms Render2: 0.0ms Flip: 0.6ms Frame: 9.9ms Render1: 9.3ms Render2: 0.0ms Flip: 0.7ms Frame: 9.9ms Render1: 8.9ms Render2: 0.0ms Flip: 0.5ms Frame: 9.5ms Render1: 9.0ms Render2: 0.0ms Flip: 0.5ms Frame: 9.5ms With targetBitmap optimisation Render1: 6.5ms Render2: 0.0ms Flip: 0.9ms Frame: 7.5ms Render1: 6.9ms Render2: 0.0ms Flip: 1.0ms Frame: 7.9ms Render1: 6.9ms Render2: 0.0ms Flip: 0.9ms Frame: 7.9ms Render1: 7.4ms Render2: 0.0ms Flip: 1.0ms Frame: 8.4ms Render1: 6.7ms Render2: 0.0ms Flip: 1.0ms Frame: 7.7ms
I also added an option for it to run flat out, without waiting for a timer event. When I enable that, I get these results:
With RUN_FLAT_OUT and targetBitmap optimisation Render1: 7.9ms Render2: 0.0ms Flip: 8.0ms Frame: 16.0ms Render1: 7.6ms Render2: 0.0ms Flip: 8.0ms Frame: 15.7ms Render1: 7.4ms Render2: 0.0ms Flip: 7.7ms Frame: 15.2ms Render1: 7.4ms Render2: 0.0ms Flip: 7.3ms Frame: 14.7ms Render1: 6.0ms Render2: 0.0ms Flip: 8.7ms Frame: 14.8ms
This is with vsync off using ALLEGRO_REQUIRE, although as you say the driver does not necessarily honour that.
Updated project:
https://drive.google.com/file/d/1RfEbTlI3yn2lzxaClDGe9Sb3j0ZYBPO9/view?usp=sharing
See these lines in updated project:
#define RUN_FLAT_OUT 0 (psector.cpp)
#define SET_TARGET_BITMAP_OPTIMISATION 1 (sprites.cpp)
I added the optimisation to only call al_set_target_bitmap when it needs to change. I see a boost to Render1 performance but strangely, al_flip_display consistently takes longer. I have split it out from the Render2 timing, which confirms that all the Render2 time is taken up by al_flip_display.
That does still does seem like vsync (or another hard limiter). So the question is now, isn't so much performance but "is there something wrong in your code, or does everyone else have the same FPS limit when using Allegro in Windows?"
Also, do you have Intel, AMD, or nVidia hardware?
Thanks,
--Chris
[edit] I dev in Linux and I've never had these problems so I'm guessing it's a Windows only issue unless there's another squirrelly timing issue hidden.
Intel Core i5-3570K with integrated Intel HD Graphics 4000 GPU
Is it a laptop, with some sort of power saving mode?
This is with vsync off using ALLEGRO_REQUIRE, although as you say the driver does not necessarily honour that.
Oddly enough, with REQUIRE, it's supposed to fail/crash if it can't force it. But who knows if that actually works.
No, it's a desktop.
Allegro can't always override the driver's decision about VSYNC. You need to disable VSYNC in your driver for your application to be sure. ALLEGRO_VSYNC is just a suggestion, even with ALLEGRO_REQUIRE, because the driver is in control.
16ms is 60HZ, which indicates VSYNC.