<?xml version="1.0"?>
<rss version="2.0">
	<channel>
		<title>A5 / GPU programming. What&#39;s the strategy?</title>
		<link>http://www.allegro.cc/forums/view/613609</link>
		<description>Allegro.cc Forum Thread</description>
		<webMaster>matthew@allegro.cc (Matthew Leverton)</webMaster>
		<lastBuildDate>Mon, 09 Dec 2013 20:14:28 +0000</lastBuildDate>
	</channel>
	<item>
		<description><![CDATA[<div class="mockup v2"><p>What do you minimize? What do you maximize? What is the plan when it comes to getting lots of performance from a 2-D GPU accelerated game?
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (Chris Katko)</author>
		<pubDate>Fri, 06 Dec 2013 15:10:00 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><p>How could a 2D game possibly need optimization like that?  It&#39;s the reason 1990 era games were 2D, even a 286 had power to burn for those.
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (Arthur Kalliokoski)</author>
		<pubDate>Fri, 06 Dec 2013 15:12:32 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><p>1. Switch the active bitmap/texture as few times as possible between rendering calls. When using sub-bitmaps, draw everything you can from one source bitmap first before drawing stuff from another. Each switch of the active bitmap/texture is a performance hit that can become drastic if you&#39;re doing it too many times per frame.</p><p>1b. In Allegro 5, use al_hold_bitmap_drawing() to make this kind of optimization to your rendering pipeline even MORE high-performance!</p><p>2. Allegro 5 has the ability to draw low-level primitives with al_draw_prim(). However, there&#39;s a huge overhead cost to call this function, thus calling it more than just a few times per frame can kill your framerate. If you must use it, try to group all calls to al_draw_prim() into a single large array, and draw that array in its entirety in a single call to al_draw_prim().</p><p>3. The Z-Buffer still has its uses in 2D rendering, because you can render things at different Z depths to obscure other things and be able to draw stuff out of order without affecting the visual quality. Plus, if you don&#39;t need to use the Z-Buffer you can turn it off for a very small performance boost. (I think it&#39;s off by default in A5.) You&#39;ll still need to manually order translucent entities though, just like in a 3D game.</p><p>4. If you absolutely must draw individual pixels for whatever reason, you need to use a fragment shader, otherwise every pixel you draw is going to take up the same amount of CPU time as a full-screen bitmap. Fragment shaders give you direct access to the raw power of the GPU without the CPU getting in the way and allow you to do things at the texel level. The trick is that because the CPU isn&#39;t getting in the way you don&#39;t have as much access to details outside of what the GPU is doing.</p><p>That&#39;s just off the top of my head, but yeah, there&#39;s definitely steps you can take to ensure you get full performance out of your games and there can be some serious framerate penalties for not doing these things. For instance, the first time I wrote a hardware accelerated mapping system, it was only getting a framerate of about 10ish, but I had no idea I shouldn&#39;t be switching the primary texture every frame. (This was not with Allegro but a different library.) I shifted all my textures onto a single main texture and my framerate shot up to 540. <img src="http://www.allegro.cc/forums/smileys/grin.gif" alt=";D" />
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (Kris Asick)</author>
		<pubDate>Fri, 06 Dec 2013 16:21:21 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><div class="quote_container"><div class="title"><a href="http://www.allegro.cc/forums/thread/613609/993680#target">Arthur Kalliokoski</a> said:</div><div class="quote"><p>How could a 2D game possibly need optimization like that? It&#39;s the reason 1990 era games were 2D, even a 286 had power to burn for those.</p></div></div><p>3D hardware is rather bad at 2D. It wasn&#39;t made for large scrolling backgrounds using tons of unique sprites.
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (Thomas Fjellstrom)</author>
		<pubDate>Fri, 06 Dec 2013 22:58:51 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><div class="quote_container"><div class="title"><a href="http://www.allegro.cc/forums/thread/613609/993682#target">Kris Asick</a> said:</div><div class="quote"><p> However, there&#39;s a huge overhead cost to call this function, thus calling it more than just a few times per frame can kill your framerate.
</p></div></div><p>If you mean calling it more than 6530 times per frame dropping your frame rate below 60 FPS &quot;killing&quot; it. By that metric <span class="source-code"><a href="http://www.allegro.cc/manual/al_draw_bitmap"><span class="a">al_draw_bitmap</span></a></span> is even less efficient, as it takes only 3450 times per frame to drop the frame below 60 FPS (no bitmap holding) on my system <img src="http://www.allegro.cc/forums/smileys/tongue.gif" alt=":P" /> (both measured by <span class="source-code">ex_draw_bitmap</span>).
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (SiegeLord)</author>
		<pubDate>Sat, 07 Dec 2013 00:04:56 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><div class="quote_container"><div class="title"><a href="http://www.allegro.cc/forums/thread/613609/993692#target">SiegeLord</a> said:</div><div class="quote"><p>If you mean calling it more than 6530 times per frame dropping your frame rate below 60 FPS &quot;killing&quot; it.</p></div></div><p>
When I first wrote my mapping system using al_draw_prim(), my framerate was about 10 to 12 seconds per frame. <img src="http://www.allegro.cc/forums/smileys/shocked.gif" alt=":o" /></p><p>I switched to using one of the al_draw_bitmap functions for most of it and STILL wasn&#39;t able to get a perfect 60 FPS. Then I was able to condense all of the things I absolutely had to draw with al_draw_prim() into a single call, and now the FPS can get well over 300. <img src="http://www.allegro.cc/forums/smileys/wink.gif" alt=";)" /></p><p>This was a few Allegro 5.0.x versions ago... I think 5.0.6ish, so maybe this has changed since? *shrugs*
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (Kris Asick)</author>
		<pubDate>Sat, 07 Dec 2013 03:00:50 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><p>Without optimizing some of my routines, I was down to 3 FPS on my quad core Athlon X4 630, with a GeForce GTX 560 SE videocard. I cut things down by separating everything first by unique ship, and then each ship has layers (tile, sprite, fire, oxygen, &quot;pretty&quot;=ship bitmap), and only updating the layers as they change. </p><p><span class="remote-thumbnail"><span class="json">{"name":"LAoUSPH.png","src":"\/\/djungxnpq2nug.cloudfront.net\/image\/cache\/b\/3\/b34c3b969a94fb9988b200603edf564e.png","w":1904,"h":968,"tn":"\/\/djungxnpq2nug.cloudfront.net\/image\/cache\/b\/3\/b34c3b969a94fb9988b200603edf564e"}</span><img src="http://www.allegro.cc//djungxnpq2nug.cloudfront.net/image/cache/b/3/b34c3b969a94fb9988b200603edf564e-240.jpg" alt="LAoUSPH.png" width="240" height="122" /></span></p><p>However, redrawing the tile layer is still ridiculously slow. And the sprite layer is too slow (12-24 FPS) on a 5,120x5,120 bitmap. It&#39;s large intentionally to test for speed issues. </p><p>It&#39;s also intentionally the size of a Space Station 13 map:</p><p><span class="remote-thumbnail"><span class="json">{"name":"fGzgBwB.jpg","src":"\/\/djungxnpq2nug.cloudfront.net\/image\/cache\/8\/d\/8da8243adb4f2fb2731b5cedf8f036c8.jpg","w":5294,"h":4532,"tn":"\/\/djungxnpq2nug.cloudfront.net\/image\/cache\/8\/d\/8da8243adb4f2fb2731b5cedf8f036c8"}</span><img src="http://www.allegro.cc//djungxnpq2nug.cloudfront.net/image/cache/8/d/8da8243adb4f2fb2731b5cedf8f036c8-240.jpg" alt="fGzgBwB.jpg" width="240" height="205" /></span></p><p>But it makes sense now that I need to group <b>all</b> my tiles and sprites into one texture to cut down on texturebind calls. 160x160 is 25,600 tiles, currently each with their own texturebind. Also, the thing about putpixel being extremely slow is super helpful. The only thing I&#39;m doing with putpixel is drawing random (but static per game run) stars (white pixels) and moving them with respect to the screen to show ship movement. 400 was killing me once I optimized to the layers. 200 is current, but now that I know I can change them all for a texture is going to be helpful. </p><p>Lastly, I&#39;ve also considered partitioning large ships into segments so I only update the &quot;dirty&quot; segments of say 32x32 instead of 160x160. Which would result in (160/5)^2 = 25 segments * (# of layers) for a full size ship.
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (Chris Katko)</author>
		<pubDate>Sat, 07 Dec 2013 06:18:11 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><p>A couple more optimizations to consider:</p><p>1. Try to make all your texture sizes powers of 2. Not every video card can properly handle textures sizes that aren&#39;t. Also, try not to exceed 4096x4096, otherwise you&#39;re going to create severe bottlenecks for low-end or older video cards.</p><p>2. Something I mentioned in another thread recently: If you want to do parallaxing starfields, the best approach is to use multiple layers of random pixels on top of each other, each parallaxing at a different speed. Drawing 4 or 5 large starfield layers is going to be easier on the CPU than drawing 500+ individual stars, and since the GPU often outpaces the CPU by huge amounts, it&#39;s not like you&#39;re gonna be wasting GPU time doing this. <img src="http://www.allegro.cc/forums/smileys/wink.gif" alt=";)" />
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (Kris Asick)</author>
		<pubDate>Sat, 07 Dec 2013 06:25:37 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><p>Draw only what is currently on the screen. And when you zoom out make things less detailed so that you would need to draw less bitmaps than you actually need to draw your whole map.<br />Edit: You could also try grouping your tiles into bigger tiles: I see you have a lot of the same tiles going one after another. You could group tiles into, for instance, 3x3 sections, 5x5 sections etc. so that you could draw them with one call to al_draw_bitmap instead of 9 or 25 calls.
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (Aikei_c)</author>
		<pubDate>Sat, 07 Dec 2013 10:35:10 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><p>One optimization (that I&#39;m not sure you can do in allegro) would be to store your entire map, assuming it&#39;s fairly static (updated infrequently), as a big vertex buffer object. This requires that you have the same texture (atlas), shader etc for the whole thing. But once it&#39;s uploaded the the GPU, drawing it will be pretty blazingly fast.
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (Jonatan Hedborg)</author>
		<pubDate>Sun, 08 Dec 2013 04:30:48 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><div class="quote_container"><div class="title"><a href="http://www.allegro.cc/forums/thread/613609/993703#target">Aikei_c</a> said:</div><div class="quote"><p>
Edit: You could also try grouping your tiles into bigger tiles: I see you have a lot of the same tiles going one after another. You could group tiles into, for instance, 3x3 sections, 5x5 sections etc. so that you could draw them with one call to al_draw_bitmap instead of 9 or 25 calls.
</p></div></div><p>
</p><div class="quote_container"><div class="title"><a href="http://www.allegro.cc/forums/thread/613609/993711#target">Jonatan Hedborg</a> said:</div><div class="quote"><p>
 assuming it&#39;s fairly static (updated infrequently), 
</p></div></div><p>
Actually, it&#39;s the opposite, depending on the time domain you&#39;re thinking of. The maps which represent ships, are actually fully destructible. I&#39;ve considered doing something to that effect with a 3-D game I was working on before, wherein cached versions are used until they are dirty, and then the manual drawing mode is used until a helper thread is capable of generating the cached version.</p><p>However, in this game, I&#39;m not too worried about that. I already cache the &quot;map&quot; to a texture and only update the texture as necessary. It&#39;s slow to redraw the updates, but blazing fast to draw normally. So my solution might be, as I think I mentioned, to partition the map into zones, and only update the dirty zones instead of redrawing the entire map. Additionally, sprites/objects don&#39;t count as draws for the map because they&#39;re on their own map layer. So the biggest update rate is the objects, and objects (as it stands) are much less in number that tiles.</p><div class="quote_container"><div class="title"><a href="http://www.allegro.cc/forums/thread/613609/993696#target">Kris Asick</a> said:</div><div class="quote"><p>
1. Try to make all your texture sizes powers of 2. Not every video card can properly handle textures sizes that aren&#39;t. Also, try not to exceed 4096x4096, otherwise you&#39;re going to create severe bottlenecks for low-end or older video cards.</p><p>2. Something I mentioned in another thread recently: If you want to do parallaxing starfields, the best approach is to use multiple layers of random pixels on top of each other, each parallaxing at a different speed. Drawing 4 or 5 large starfield layers is going to be easier on the CPU than drawing 500+ individual stars, and since the GPU often outpaces the CPU by huge amounts, it&#39;s not like you&#39;re gonna be wasting GPU time doing this. <img src="http://www.allegro.cc/forums/smileys/wink.gif" alt=";)" />
</p></div></div><p>
You are spot on.
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (Chris Katko)</author>
		<pubDate>Sun, 08 Dec 2013 07:33:36 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><div class="quote_container"><div class="title"><a href="http://www.allegro.cc/forums/thread/613609/993682#target">Kris Asick</a> said:</div><div class="quote"><p>3. The Z-Buffer still has its uses in 2D rendering, because you can render things at different Z depths to obscure other things and be able to draw stuff out of order without affecting the visual quality. Plus, if you don&#39;t need to use the Z-Buffer you can turn it off for a very small performance boost. (I think it&#39;s off by default in A5.) You&#39;ll still need to manually order translucent entities though, just like in a 3D game.</p></div></div><p>

I still don&#39;t see an &#39;easy way&#39; to use the Z-Buffer. From what I&#39;ve seen in the forums you set up how many layers to the z buffer you want with the new display flags then use calls to OpenGL when drawing. I keep looking in the manual for some magic al_set_z_buffer_blit_distance() function or some such.
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (hyreia77)</author>
		<pubDate>Sun, 08 Dec 2013 11:25:18 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><div class="quote_container"><div class="title"><a href="http://www.allegro.cc/forums/thread/613609/993724#target">hyreia77</a> said:</div><div class="quote"><p>I still don&#39;t see an &#39;easy way&#39; to use the Z-Buffer.</p></div></div><p>
For 2D rendering, there isn&#39;t one with Allegro. Often to get full performance out of one&#39;s code, you need to do things the hard way. <img src="http://www.allegro.cc/forums/smileys/tongue.gif" alt=":P" />
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (Kris Asick)</author>
		<pubDate>Sun, 08 Dec 2013 11:46:14 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><p>Doing this should probably work (in 5.1):</p><div class="source-code snippet"><div class="inner"><pre><a href="http://www.allegro.cc/manual/ALLEGRO_TRANSFORM"><span class="a">ALLEGRO_TRANSFORM</span></a> t<span class="k2">;</span>
<a href="http://www.allegro.cc/manual/al_identity_transform"><span class="a">al_identity_transform</span></a><span class="k2">(</span><span class="k3">&amp;</span>t<span class="k2">)</span><span class="k2">;</span>
al_translate_transform_3d<span class="k2">(</span><span class="k3">&amp;</span>t, <span class="n">0</span>, <span class="n">0</span>, z<span class="k2">)</span><span class="k2">;</span>
<a href="http://www.allegro.cc/manual/al_use_transform"><span class="a">al_use_transform</span></a><span class="k2">(</span><span class="k3">&amp;</span>t<span class="k2">)</span><span class="k2">;</span>
</pre></div></div><p>

<span class="source-code">z</span> can range from -1 to 1. I don&#39;t remember which way 1 points though.
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (SiegeLord)</author>
		<pubDate>Sun, 08 Dec 2013 11:57:29 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><p>While that would work, I don&#39;t think it would perform very well if tons and tons of stuff were on completely different depth levels because of how Allegro handles its transformation system. I could be wrong about that though.</p><p>Calling the transformation routines just a few times a frame should be OK.
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (Kris Asick)</author>
		<pubDate>Mon, 09 Dec 2013 05:13:16 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><p>That&#39;s a good point... I do happen to know that no GPU calls are made when calling that function if the drawing is held (transformations are pre-multiplied in software).
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (SiegeLord)</author>
		<pubDate>Mon, 09 Dec 2013 12:44:05 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><div class="quote_container"><div class="title"><a href="http://www.allegro.cc/forums/thread/613609/993696#target">Kris Asick</a> said:</div><div class="quote"><p>Also, try not to exceed 4096x4096, otherwise you&#39;re going to create severe bottlenecks for low-end or older video cards.</p></div></div><p>

Is 4096*4096 going to work on all devices Allegro 5 supports? Mobile devices, too?</p><p>Last time I checked (about... 6 years ago <img src="http://www.allegro.cc/forums/smileys/grin.gif" alt=";D" /> <img src="http://www.allegro.cc/forums/smileys/rolleyes.gif" alt="::)" />), people suggested a safe maximum of 2048*2048...
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (Polybios)</author>
		<pubDate>Mon, 09 Dec 2013 14:37:14 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><div class="quote_container"><div class="title"><a href="http://www.allegro.cc/forums/thread/613609/993779#target">Polybios</a> said:</div><div class="quote"><p>
Is 4096*4096 going to work on all devices Allegro 5 supports? Mobile devices, too?</p><p>Last time I checked (about... 6 years ago <img src="http://www.allegro.cc/forums/smileys/grin.gif" alt=";D" /> <img src="http://www.allegro.cc/forums/smileys/rolleyes.gif" alt="::)" />), people suggested a safe maximum of 2048*2048...
</p></div></div><p>
It&#39;s fairly obvious that mobile devices are a separate issue, and 2048x2048 is the maximum texture size for many mobile phones. <img src="http://www.allegro.cc/forums/smileys/tongue.gif" alt=":P" />
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (Chris Katko)</author>
		<pubDate>Mon, 09 Dec 2013 17:30:15 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><div class="quote_container"><div class="title"><a href="http://www.allegro.cc/forums/thread/613609/993779#target">Polybios</a> said:</div><div class="quote"><p>Is 4096*4096 going to work on all devices Allegro 5 supports? Mobile devices, too?</p></div></div><p>
Probably not. I often forget that you can make mobile games with Allegro 5. <img src="http://www.allegro.cc/forums/smileys/rolleyes.gif" alt="::)" /></p><p>For computers though, 4096 is the safe maximum for sure. Most modern video cards can handle up to 8192 without issue, but an 8192x8192 bitmap takes up 256 MB of video memory. Even a 4096x4096 bitmap takes up 64 MB. So even if a graphics chipset can handle a particular resolution, you have to consider how much memory you&#39;re using in the process. <img src="http://www.allegro.cc/forums/smileys/tongue.gif" alt=":P" />
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (Kris Asick)</author>
		<pubDate>Mon, 09 Dec 2013 20:14:28 +0000</pubDate>
	</item>
</rss>
