Allegro.cc - Online Community

Allegro.cc Forums » Programming Questions » A5 / GPU programming. What's the strategy?

This thread is locked; no one can reply to it. rss feed Print
A5 / GPU programming. What's the strategy?
Chris Katko
Member #1,881
January 2002
avatar

What do you minimize? What do you maximize? What is the plan when it comes to getting lots of performance from a 2-D GPU accelerated game?

-----sig:
“Programs should be written for people to read, and only incidentally for machines to execute.” - Structure and Interpretation of Computer Programs

Arthur Kalliokoski
Second in Command
February 2005
avatar

How could a 2D game possibly need optimization like that? It's the reason 1990 era games were 2D, even a 286 had power to burn for those.

“Throughout history, poverty is the normal condition of man. Advances which permit this norm to be exceeded — here and there, now and then — are the work of an extremely small minority, frequently despised, often condemned, and almost always opposed by all right-thinking people. Whenever this tiny minority is kept from creating, or (as sometimes happens) is driven out of a society, the people then slip back into abject poverty. This is known as "bad luck.”

― Robert A. Heinlein

Kris Asick
Member #1,424
July 2001

1. Switch the active bitmap/texture as few times as possible between rendering calls. When using sub-bitmaps, draw everything you can from one source bitmap first before drawing stuff from another. Each switch of the active bitmap/texture is a performance hit that can become drastic if you're doing it too many times per frame.

1b. In Allegro 5, use al_hold_bitmap_drawing() to make this kind of optimization to your rendering pipeline even MORE high-performance!

2. Allegro 5 has the ability to draw low-level primitives with al_draw_prim(). However, there's a huge overhead cost to call this function, thus calling it more than just a few times per frame can kill your framerate. If you must use it, try to group all calls to al_draw_prim() into a single large array, and draw that array in its entirety in a single call to al_draw_prim().

3. The Z-Buffer still has its uses in 2D rendering, because you can render things at different Z depths to obscure other things and be able to draw stuff out of order without affecting the visual quality. Plus, if you don't need to use the Z-Buffer you can turn it off for a very small performance boost. (I think it's off by default in A5.) You'll still need to manually order translucent entities though, just like in a 3D game.

4. If you absolutely must draw individual pixels for whatever reason, you need to use a fragment shader, otherwise every pixel you draw is going to take up the same amount of CPU time as a full-screen bitmap. Fragment shaders give you direct access to the raw power of the GPU without the CPU getting in the way and allow you to do things at the texel level. The trick is that because the CPU isn't getting in the way you don't have as much access to details outside of what the GPU is doing.

That's just off the top of my head, but yeah, there's definitely steps you can take to ensure you get full performance out of your games and there can be some serious framerate penalties for not doing these things. For instance, the first time I wrote a hardware accelerated mapping system, it was only getting a framerate of about 10ish, but I had no idea I shouldn't be switching the primary texture every frame. (This was not with Allegro but a different library.) I shifted all my textures onto a single main texture and my framerate shot up to 540. ;D

--- Kris Asick (Gemini)
--- http://www.pixelships.com

Thomas Fjellstrom
Member #476
June 2000
avatar

How could a 2D game possibly need optimization like that? It's the reason 1990 era games were 2D, even a 286 had power to burn for those.

3D hardware is rather bad at 2D. It wasn't made for large scrolling backgrounds using tons of unique sprites.

--
Thomas Fjellstrom - [website] - [email] - [Allegro Wiki] - [Allegro TODO]
"If you can't think of a better solution, don't try to make a better solution." -- weapon_S
"The less evidence we have for what we believe is certain, the more violently we defend beliefs against those who don't agree" -- https://twitter.com/neiltyson/status/592870205409353730

SiegeLord
Member #7,827
October 2006
avatar

However, there's a huge overhead cost to call this function, thus calling it more than just a few times per frame can kill your framerate.

If you mean calling it more than 6530 times per frame dropping your frame rate below 60 FPS "killing" it. By that metric al_draw_bitmap is even less efficient, as it takes only 3450 times per frame to drop the frame below 60 FPS (no bitmap holding) on my system :P (both measured by ex_draw_bitmap).

"For in much wisdom is much grief: and he that increases knowledge increases sorrow."-Ecclesiastes 1:18
[SiegeLord's Abode][Codes]:[DAllegro5]:[RustAllegro]

Kris Asick
Member #1,424
July 2001

SiegeLord said:

If you mean calling it more than 6530 times per frame dropping your frame rate below 60 FPS "killing" it.

When I first wrote my mapping system using al_draw_prim(), my framerate was about 10 to 12 seconds per frame. :o

I switched to using one of the al_draw_bitmap functions for most of it and STILL wasn't able to get a perfect 60 FPS. Then I was able to condense all of the things I absolutely had to draw with al_draw_prim() into a single call, and now the FPS can get well over 300. ;)

This was a few Allegro 5.0.x versions ago... I think 5.0.6ish, so maybe this has changed since? *shrugs*

--- Kris Asick (Gemini)
--- http://www.pixelships.com

Chris Katko
Member #1,881
January 2002
avatar

Without optimizing some of my routines, I was down to 3 FPS on my quad core Athlon X4 630, with a GeForce GTX 560 SE videocard. I cut things down by separating everything first by unique ship, and then each ship has layers (tile, sprite, fire, oxygen, "pretty"=ship bitmap), and only updating the layers as they change.

{"name":"LAoUSPH.png","src":"\/\/djungxnpq2nug.cloudfront.net\/image\/cache\/b\/3\/b34c3b969a94fb9988b200603edf564e.png","w":1904,"h":968,"tn":"\/\/djungxnpq2nug.cloudfront.net\/image\/cache\/b\/3\/b34c3b969a94fb9988b200603edf564e"}LAoUSPH.png

However, redrawing the tile layer is still ridiculously slow. And the sprite layer is too slow (12-24 FPS) on a 5,120x5,120 bitmap. It's large intentionally to test for speed issues.

It's also intentionally the size of a Space Station 13 map:

{"name":"fGzgBwB.jpg","src":"\/\/djungxnpq2nug.cloudfront.net\/image\/cache\/8\/d\/8da8243adb4f2fb2731b5cedf8f036c8.jpg","w":5294,"h":4532,"tn":"\/\/djungxnpq2nug.cloudfront.net\/image\/cache\/8\/d\/8da8243adb4f2fb2731b5cedf8f036c8"}fGzgBwB.jpg

But it makes sense now that I need to group all my tiles and sprites into one texture to cut down on texturebind calls. 160x160 is 25,600 tiles, currently each with their own texturebind. Also, the thing about putpixel being extremely slow is super helpful. The only thing I'm doing with putpixel is drawing random (but static per game run) stars (white pixels) and moving them with respect to the screen to show ship movement. 400 was killing me once I optimized to the layers. 200 is current, but now that I know I can change them all for a texture is going to be helpful.

Lastly, I've also considered partitioning large ships into segments so I only update the "dirty" segments of say 32x32 instead of 160x160. Which would result in (160/5)^2 = 25 segments * (# of layers) for a full size ship.

-----sig:
“Programs should be written for people to read, and only incidentally for machines to execute.” - Structure and Interpretation of Computer Programs

Kris Asick
Member #1,424
July 2001

A couple more optimizations to consider:

1. Try to make all your texture sizes powers of 2. Not every video card can properly handle textures sizes that aren't. Also, try not to exceed 4096x4096, otherwise you're going to create severe bottlenecks for low-end or older video cards.

2. Something I mentioned in another thread recently: If you want to do parallaxing starfields, the best approach is to use multiple layers of random pixels on top of each other, each parallaxing at a different speed. Drawing 4 or 5 large starfield layers is going to be easier on the CPU than drawing 500+ individual stars, and since the GPU often outpaces the CPU by huge amounts, it's not like you're gonna be wasting GPU time doing this. ;)

--- Kris Asick (Gemini)
--- http://www.pixelships.com

Aikei_c
Member #14,871
January 2013
avatar

Draw only what is currently on the screen. And when you zoom out make things less detailed so that you would need to draw less bitmaps than you actually need to draw your whole map.
Edit: You could also try grouping your tiles into bigger tiles: I see you have a lot of the same tiles going one after another. You could group tiles into, for instance, 3x3 sections, 5x5 sections etc. so that you could draw them with one call to al_draw_bitmap instead of 9 or 25 calls.

Jonatan Hedborg
Member #4,886
July 2004
avatar

One optimization (that I'm not sure you can do in allegro) would be to store your entire map, assuming it's fairly static (updated infrequently), as a big vertex buffer object. This requires that you have the same texture (atlas), shader etc for the whole thing. But once it's uploaded the the GPU, drawing it will be pretty blazingly fast.

-------
Sweden: Free from the shackles of Democracy since 2008-06-18!

Chris Katko
Member #1,881
January 2002
avatar

Aikei_c said:

Edit: You could also try grouping your tiles into bigger tiles: I see you have a lot of the same tiles going one after another. You could group tiles into, for instance, 3x3 sections, 5x5 sections etc. so that you could draw them with one call to al_draw_bitmap instead of 9 or 25 calls.

assuming it's fairly static (updated infrequently),

Actually, it's the opposite, depending on the time domain you're thinking of. The maps which represent ships, are actually fully destructible. I've considered doing something to that effect with a 3-D game I was working on before, wherein cached versions are used until they are dirty, and then the manual drawing mode is used until a helper thread is capable of generating the cached version.

However, in this game, I'm not too worried about that. I already cache the "map" to a texture and only update the texture as necessary. It's slow to redraw the updates, but blazing fast to draw normally. So my solution might be, as I think I mentioned, to partition the map into zones, and only update the dirty zones instead of redrawing the entire map. Additionally, sprites/objects don't count as draws for the map because they're on their own map layer. So the biggest update rate is the objects, and objects (as it stands) are much less in number that tiles.

1. Try to make all your texture sizes powers of 2. Not every video card can properly handle textures sizes that aren't. Also, try not to exceed 4096x4096, otherwise you're going to create severe bottlenecks for low-end or older video cards.

2. Something I mentioned in another thread recently: If you want to do parallaxing starfields, the best approach is to use multiple layers of random pixels on top of each other, each parallaxing at a different speed. Drawing 4 or 5 large starfield layers is going to be easier on the CPU than drawing 500+ individual stars, and since the GPU often outpaces the CPU by huge amounts, it's not like you're gonna be wasting GPU time doing this. ;)

You are spot on.

-----sig:
“Programs should be written for people to read, and only incidentally for machines to execute.” - Structure and Interpretation of Computer Programs

hyreia77
Member #13,742
November 2011

3. The Z-Buffer still has its uses in 2D rendering, because you can render things at different Z depths to obscure other things and be able to draw stuff out of order without affecting the visual quality. Plus, if you don't need to use the Z-Buffer you can turn it off for a very small performance boost. (I think it's off by default in A5.) You'll still need to manually order translucent entities though, just like in a 3D game.

I still don't see an 'easy way' to use the Z-Buffer. From what I've seen in the forums you set up how many layers to the z buffer you want with the new display flags then use calls to OpenGL when drawing. I keep looking in the manual for some magic al_set_z_buffer_blit_distance() function or some such.

Kris Asick
Member #1,424
July 2001

hyreia77 said:

I still don't see an 'easy way' to use the Z-Buffer.

For 2D rendering, there isn't one with Allegro. Often to get full performance out of one's code, you need to do things the hard way. :P

--- Kris Asick (Gemini)
--- http://www.pixelships.com

SiegeLord
Member #7,827
October 2006
avatar

Doing this should probably work (in 5.1):

ALLEGRO_TRANSFORM t;
al_identity_transform(&t);
al_translate_transform_3d(&t, 0, 0, z);
al_use_transform(&t);

z can range from -1 to 1. I don't remember which way 1 points though.

"For in much wisdom is much grief: and he that increases knowledge increases sorrow."-Ecclesiastes 1:18
[SiegeLord's Abode][Codes]:[DAllegro5]:[RustAllegro]

Kris Asick
Member #1,424
July 2001

While that would work, I don't think it would perform very well if tons and tons of stuff were on completely different depth levels because of how Allegro handles its transformation system. I could be wrong about that though.

Calling the transformation routines just a few times a frame should be OK.

--- Kris Asick (Gemini)
--- http://www.pixelships.com

SiegeLord
Member #7,827
October 2006
avatar

That's a good point... I do happen to know that no GPU calls are made when calling that function if the drawing is held (transformations are pre-multiplied in software).

"For in much wisdom is much grief: and he that increases knowledge increases sorrow."-Ecclesiastes 1:18
[SiegeLord's Abode][Codes]:[DAllegro5]:[RustAllegro]

Polybios
Member #12,293
October 2010

Also, try not to exceed 4096x4096, otherwise you're going to create severe bottlenecks for low-end or older video cards.

Is 4096*4096 going to work on all devices Allegro 5 supports? Mobile devices, too?

Last time I checked (about... 6 years ago ;D ::)), people suggested a safe maximum of 2048*2048...

Chris Katko
Member #1,881
January 2002
avatar

Polybios said:

Is 4096*4096 going to work on all devices Allegro 5 supports? Mobile devices, too?

Last time I checked (about... 6 years ago ;D ::)), people suggested a safe maximum of 2048*2048...

It's fairly obvious that mobile devices are a separate issue, and 2048x2048 is the maximum texture size for many mobile phones. :P

-----sig:
“Programs should be written for people to read, and only incidentally for machines to execute.” - Structure and Interpretation of Computer Programs

Kris Asick
Member #1,424
July 2001

Polybios said:

Is 4096*4096 going to work on all devices Allegro 5 supports? Mobile devices, too?

Probably not. I often forget that you can make mobile games with Allegro 5. ::)

For computers though, 4096 is the safe maximum for sure. Most modern video cards can handle up to 8192 without issue, but an 8192x8192 bitmap takes up 256 MB of video memory. Even a 4096x4096 bitmap takes up 64 MB. So even if a graphics chipset can handle a particular resolution, you have to consider how much memory you're using in the process. :P

--- Kris Asick (Gemini)
--- http://www.pixelships.com

Go to: