A5 / GPU programming. What's the strategy?
Chris Katko

What do you minimize? What do you maximize? What is the plan when it comes to getting lots of performance from a 2-D GPU accelerated game?

Arthur Kalliokoski

How could a 2D game possibly need optimization like that? It's the reason 1990 era games were 2D, even a 286 had power to burn for those.

Kris Asick

1. Switch the active bitmap/texture as few times as possible between rendering calls. When using sub-bitmaps, draw everything you can from one source bitmap first before drawing stuff from another. Each switch of the active bitmap/texture is a performance hit that can become drastic if you're doing it too many times per frame.

1b. In Allegro 5, use al_hold_bitmap_drawing() to make this kind of optimization to your rendering pipeline even MORE high-performance!

2. Allegro 5 has the ability to draw low-level primitives with al_draw_prim(). However, there's a huge overhead cost to call this function, thus calling it more than just a few times per frame can kill your framerate. If you must use it, try to group all calls to al_draw_prim() into a single large array, and draw that array in its entirety in a single call to al_draw_prim().

3. The Z-Buffer still has its uses in 2D rendering, because you can render things at different Z depths to obscure other things and be able to draw stuff out of order without affecting the visual quality. Plus, if you don't need to use the Z-Buffer you can turn it off for a very small performance boost. (I think it's off by default in A5.) You'll still need to manually order translucent entities though, just like in a 3D game.

4. If you absolutely must draw individual pixels for whatever reason, you need to use a fragment shader, otherwise every pixel you draw is going to take up the same amount of CPU time as a full-screen bitmap. Fragment shaders give you direct access to the raw power of the GPU without the CPU getting in the way and allow you to do things at the texel level. The trick is that because the CPU isn't getting in the way you don't have as much access to details outside of what the GPU is doing.

That's just off the top of my head, but yeah, there's definitely steps you can take to ensure you get full performance out of your games and there can be some serious framerate penalties for not doing these things. For instance, the first time I wrote a hardware accelerated mapping system, it was only getting a framerate of about 10ish, but I had no idea I shouldn't be switching the primary texture every frame. (This was not with Allegro but a different library.) I shifted all my textures onto a single main texture and my framerate shot up to 540. ;D

Thomas Fjellstrom

How could a 2D game possibly need optimization like that? It's the reason 1990 era games were 2D, even a 286 had power to burn for those.

3D hardware is rather bad at 2D. It wasn't made for large scrolling backgrounds using tons of unique sprites.

SiegeLord

However, there's a huge overhead cost to call this function, thus calling it more than just a few times per frame can kill your framerate.

If you mean calling it more than 6530 times per frame dropping your frame rate below 60 FPS "killing" it. By that metric al_draw_bitmap is even less efficient, as it takes only 3450 times per frame to drop the frame below 60 FPS (no bitmap holding) on my system :P (both measured by ex_draw_bitmap).

Kris Asick
SiegeLord said:

If you mean calling it more than 6530 times per frame dropping your frame rate below 60 FPS "killing" it.

When I first wrote my mapping system using al_draw_prim(), my framerate was about 10 to 12 seconds per frame. :o

I switched to using one of the al_draw_bitmap functions for most of it and STILL wasn't able to get a perfect 60 FPS. Then I was able to condense all of the things I absolutely had to draw with al_draw_prim() into a single call, and now the FPS can get well over 300. ;)

This was a few Allegro 5.0.x versions ago... I think 5.0.6ish, so maybe this has changed since? *shrugs*

Chris Katko

Without optimizing some of my routines, I was down to 3 FPS on my quad core Athlon X4 630, with a GeForce GTX 560 SE videocard. I cut things down by separating everything first by unique ship, and then each ship has layers (tile, sprite, fire, oxygen, "pretty"=ship bitmap), and only updating the layers as they change.

{"name":"LAoUSPH.png","src":"\/\/djungxnpq2nug.cloudfront.net\/image\/cache\/b\/3\/b34c3b969a94fb9988b200603edf564e.png","w":1904,"h":968,"tn":"\/\/djungxnpq2nug.cloudfront.net\/image\/cache\/b\/3\/b34c3b969a94fb9988b200603edf564e"}LAoUSPH.png

However, redrawing the tile layer is still ridiculously slow. And the sprite layer is too slow (12-24 FPS) on a 5,120x5,120 bitmap. It's large intentionally to test for speed issues.

It's also intentionally the size of a Space Station 13 map:

{"name":"fGzgBwB.jpg","src":"\/\/djungxnpq2nug.cloudfront.net\/image\/cache\/8\/d\/8da8243adb4f2fb2731b5cedf8f036c8.jpg","w":5294,"h":4532,"tn":"\/\/djungxnpq2nug.cloudfront.net\/image\/cache\/8\/d\/8da8243adb4f2fb2731b5cedf8f036c8"}fGzgBwB.jpg

But it makes sense now that I need to group all my tiles and sprites into one texture to cut down on texturebind calls. 160x160 is 25,600 tiles, currently each with their own texturebind. Also, the thing about putpixel being extremely slow is super helpful. The only thing I'm doing with putpixel is drawing random (but static per game run) stars (white pixels) and moving them with respect to the screen to show ship movement. 400 was killing me once I optimized to the layers. 200 is current, but now that I know I can change them all for a texture is going to be helpful.

Lastly, I've also considered partitioning large ships into segments so I only update the "dirty" segments of say 32x32 instead of 160x160. Which would result in (160/5)^2 = 25 segments * (# of layers) for a full size ship.

Kris Asick

A couple more optimizations to consider:

1. Try to make all your texture sizes powers of 2. Not every video card can properly handle textures sizes that aren't. Also, try not to exceed 4096x4096, otherwise you're going to create severe bottlenecks for low-end or older video cards.

2. Something I mentioned in another thread recently: If you want to do parallaxing starfields, the best approach is to use multiple layers of random pixels on top of each other, each parallaxing at a different speed. Drawing 4 or 5 large starfield layers is going to be easier on the CPU than drawing 500+ individual stars, and since the GPU often outpaces the CPU by huge amounts, it's not like you're gonna be wasting GPU time doing this. ;)

Aikei_c

Draw only what is currently on the screen. And when you zoom out make things less detailed so that you would need to draw less bitmaps than you actually need to draw your whole map.
Edit: You could also try grouping your tiles into bigger tiles: I see you have a lot of the same tiles going one after another. You could group tiles into, for instance, 3x3 sections, 5x5 sections etc. so that you could draw them with one call to al_draw_bitmap instead of 9 or 25 calls.

Jonatan Hedborg

One optimization (that I'm not sure you can do in allegro) would be to store your entire map, assuming it's fairly static (updated infrequently), as a big vertex buffer object. This requires that you have the same texture (atlas), shader etc for the whole thing. But once it's uploaded the the GPU, drawing it will be pretty blazingly fast.

Chris Katko
Aikei_c said:

Edit: You could also try grouping your tiles into bigger tiles: I see you have a lot of the same tiles going one after another. You could group tiles into, for instance, 3x3 sections, 5x5 sections etc. so that you could draw them with one call to al_draw_bitmap instead of 9 or 25 calls.

assuming it's fairly static (updated infrequently),

Actually, it's the opposite, depending on the time domain you're thinking of. The maps which represent ships, are actually fully destructible. I've considered doing something to that effect with a 3-D game I was working on before, wherein cached versions are used until they are dirty, and then the manual drawing mode is used until a helper thread is capable of generating the cached version.

However, in this game, I'm not too worried about that. I already cache the "map" to a texture and only update the texture as necessary. It's slow to redraw the updates, but blazing fast to draw normally. So my solution might be, as I think I mentioned, to partition the map into zones, and only update the dirty zones instead of redrawing the entire map. Additionally, sprites/objects don't count as draws for the map because they're on their own map layer. So the biggest update rate is the objects, and objects (as it stands) are much less in number that tiles.

1. Try to make all your texture sizes powers of 2. Not every video card can properly handle textures sizes that aren't. Also, try not to exceed 4096x4096, otherwise you're going to create severe bottlenecks for low-end or older video cards.

2. Something I mentioned in another thread recently: If you want to do parallaxing starfields, the best approach is to use multiple layers of random pixels on top of each other, each parallaxing at a different speed. Drawing 4 or 5 large starfield layers is going to be easier on the CPU than drawing 500+ individual stars, and since the GPU often outpaces the CPU by huge amounts, it's not like you're gonna be wasting GPU time doing this. ;)

You are spot on.

hyreia77

3. The Z-Buffer still has its uses in 2D rendering, because you can render things at different Z depths to obscure other things and be able to draw stuff out of order without affecting the visual quality. Plus, if you don't need to use the Z-Buffer you can turn it off for a very small performance boost. (I think it's off by default in A5.) You'll still need to manually order translucent entities though, just like in a 3D game.

I still don't see an 'easy way' to use the Z-Buffer. From what I've seen in the forums you set up how many layers to the z buffer you want with the new display flags then use calls to OpenGL when drawing. I keep looking in the manual for some magic al_set_z_buffer_blit_distance() function or some such.

Kris Asick
hyreia77 said:

I still don't see an 'easy way' to use the Z-Buffer.

For 2D rendering, there isn't one with Allegro. Often to get full performance out of one's code, you need to do things the hard way. :P

SiegeLord

Doing this should probably work (in 5.1):

ALLEGRO_TRANSFORM t;
al_identity_transform(&t);
al_translate_transform_3d(&t, 0, 0, z);
al_use_transform(&t);

z can range from -1 to 1. I don't remember which way 1 points though.

Kris Asick

While that would work, I don't think it would perform very well if tons and tons of stuff were on completely different depth levels because of how Allegro handles its transformation system. I could be wrong about that though.

Calling the transformation routines just a few times a frame should be OK.

SiegeLord

That's a good point... I do happen to know that no GPU calls are made when calling that function if the drawing is held (transformations are pre-multiplied in software).

Polybios

Also, try not to exceed 4096x4096, otherwise you're going to create severe bottlenecks for low-end or older video cards.

Is 4096*4096 going to work on all devices Allegro 5 supports? Mobile devices, too?

Last time I checked (about... 6 years ago ;D ::)), people suggested a safe maximum of 2048*2048...

Chris Katko
Polybios said:

Is 4096*4096 going to work on all devices Allegro 5 supports? Mobile devices, too?

Last time I checked (about... 6 years ago ;D ::)), people suggested a safe maximum of 2048*2048...

It's fairly obvious that mobile devices are a separate issue, and 2048x2048 is the maximum texture size for many mobile phones. :P

Kris Asick
Polybios said:

Is 4096*4096 going to work on all devices Allegro 5 supports? Mobile devices, too?

Probably not. I often forget that you can make mobile games with Allegro 5. ::)

For computers though, 4096 is the safe maximum for sure. Most modern video cards can handle up to 8192 without issue, but an 8192x8192 bitmap takes up 256 MB of video memory. Even a 4096x4096 bitmap takes up 64 MB. So even if a graphics chipset can handle a particular resolution, you have to consider how much memory you're using in the process. :P

Thread #613609. Printed from Allegro.cc