<?xml version="1.0"?>
<rss version="2.0">
	<channel>
		<title>trans_blender, way too slow</title>
		<link>http://www.allegro.cc/forums/view/590613</link>
		<description>Allegro.cc Forum Thread</description>
		<webMaster>matthew@allegro.cc (Matthew Leverton)</webMaster>
		<lastBuildDate>Tue, 20 Mar 2007 00:42:13 +0000</lastBuildDate>
	</channel>
	<item>
		<description><![CDATA[<div class="mockup v2"><p>In my game, to make pausing the game a little more... interesting. I used the transblender to make a translucent colour over the whole of the screen. And then cycle through all of the colours of the rainbow... &#39;tis very nice. </p><p>But since i increased the screen size from 640*480 to 800*480 (for that sweet widescreenedness) it goes incredibly slowly, no really, CPU goes from 12% to 100%, and framerate drops from 60FPS to about 20FPS, not so nice anymore.</p><p>Is there anyway i can speed up the process?</p><div class="source-code snippet"><div class="inner"><pre><a href="http://www.allegro.cc/manual/set_trans_blender" target="_blank"><span class="a">set_trans_blender</span></a><span class="k2">(</span><span class="n">0</span>, <span class="n">0</span>, <span class="n">0</span>, <span class="n">75</span><span class="k2">)</span><span class="k2">;</span>

<a href="http://www.allegro.cc/manual/drawing_mode" target="_blank"><span class="a">drawing_mode</span></a><span class="k2">(</span>DRAW_MODE_TRANS, <span class="n">0</span>, <span class="n">0</span>, <span class="n">0</span><span class="k2">)</span><span class="k2">;</span>
<a href="http://www.allegro.cc/manual/rectfill" target="_blank"><span class="a">rectfill</span></a><span class="k2">(</span>Buffer, <span class="n">0</span>, <span class="n">0</span>, <span class="n">800</span>, <span class="n">480</span>, <a href="http://www.allegro.cc/manual/makecol" target="_blank"><span class="a">makecol</span></a><span class="k2">(</span>Red, Green, Blue<span class="k2">)</span><span class="k2">)</span><span class="k2">;</span>
<a href="http://www.allegro.cc/manual/drawing_mode" target="_blank"><span class="a">drawing_mode</span></a><span class="k2">(</span>DRAW_MODE_SOLID, <span class="n">0</span>, <span class="n">0</span>, <span class="n">0</span><span class="k2">)</span><span class="k2">;</span>
</pre></div></div><p>
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (The Unknown)</author>
		<pubDate>Mon, 19 Mar 2007 07:02:14 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><p>Use something with hardware acceleration, like Open Layer.
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (BAF)</author>
		<pubDate>Mon, 19 Mar 2007 07:09:08 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><p><a href="http://sourceforge.net/projects/fblend/">Or fblend</a>
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (kazzmir)</author>
		<pubDate>Mon, 19 Mar 2007 07:21:39 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><p>For tinting the screen in any other color depth besides 8-bit with only using vanilla allegro, I&#39;ve found using draw_lit_sprite can increase the framerate.  This is not a major increase, but it will work better if you do not wish to tag on an add-on library (which <i>will</i> be the recommended solution).
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (Onewing)</author>
		<pubDate>Mon, 19 Mar 2007 08:18:52 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><p>For 15, 16 and 32-bit modes you can use something like the code below. It does 50-50 average with given color. The code below works in 32-bit modes, but you may write 15-bit and 16-bit versions quite easily. Note that it works only for memory bitmaps. <img src="http://www.allegro.cc/forums/smileys/smiley.gif" alt=":)" />
</p><div class="source-code snippet"><div class="inner"><pre><span class="k1">void</span> tint_bitmap<span class="k2">(</span><a href="http://www.allegro.cc/manual/BITMAP" target="_blank"><span class="a">BITMAP</span></a> <span class="k3">*</span>bmp,<span class="k1">int</span> color<span class="k2">)</span>
<span class="k2">{</span>
  color <span class="k3">=</span> <span class="k2">(</span>color&gt;&gt;1<span class="k2">)</span><span class="k3">&amp;</span><span class="n">0x7F7F7F</span><span class="k2">;</span>
  <span class="k1">for</span><span class="k2">(</span><span class="k1">int</span> y<span class="k3">=</span><span class="n">0</span><span class="k2">;</span>y<span class="k3">&lt;</span>bmp-&gt;h<span class="k2">;</span>y<span class="k3">+</span><span class="k3">+</span><span class="k2">)</span>
  <span class="k2">{</span>
    <span class="k1">int</span> <span class="k3">*</span>pixel <span class="k3">=</span> <span class="k2">(</span><span class="k1">int</span><span class="k3">*</span><span class="k2">)</span><span class="k2">(</span>bmp-&gt;line<span class="k2">[</span>y<span class="k2">]</span><span class="k2">)</span><span class="k2">;</span>
    <span class="k1">int</span> <span class="k3">*</span>end <span class="k3">=</span> pixel <span class="k3">+</span> bmp-&gt;w<span class="k2">;</span>
    <span class="k1">while</span><span class="k2">(</span>pixel<span class="k3">&lt;</span>end<span class="k2">)</span>
    <span class="k2">{</span>
      <span class="k3">*</span>pixel <span class="k3">=</span> <span class="k2">(</span><span class="k2">(</span><span class="k3">*</span>pixel&gt;&gt;1<span class="k2">)</span><span class="k3">&amp;</span><span class="n">0x7F7F7F</span><span class="k2">)</span><span class="k3">+</span>color<span class="k2">;</span>
      pixel<span class="k3">+</span><span class="k3">+</span><span class="k2">;</span>
    <span class="k2">}</span>
  <span class="k2">}</span>
<span class="k2">}</span>
</pre></div></div><p>
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (Krzysztof Kluczek)</author>
		<pubDate>Mon, 19 Mar 2007 08:38:48 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><p>Use fblend! I am using it since a few days ago, and it works nicely and fast! and it is very easy to use!</p><p>I don&#39;t know why, but Krzysztof&#39;s code is really fast!<br />I added it to my project, and applied it to my buffer bitmap (640x480x32) and it only increased ~10% CPU usage, fblend_rect_trans() under the same conditions added ~13%.<br />But of course, I could be completely wrong, perhaps that&#39;s not a proper way to compare efficiency.
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (Paul whoknows)</author>
		<pubDate>Mon, 19 Mar 2007 11:51:36 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><div class="quote_container"><div class="title">Quote:</div><div class="quote"><p>
I don&#39;t know why, but Krzysztof&#39;s code is really fast!
</p></div></div><p>
It&#39;s just simple, uses only basic operations and works on entire RGB triples. Also it gets some speed for sure from working directly with pointers. You still probably can make it even faster by using MMX and operating on two pixels in every iteration (MMX registers are 64-bit wide) or even 4 pixels at once in 15 and 16-bit modes. <img src="http://www.allegro.cc/forums/smileys/smiley.gif" alt=":)" /></p><p>Replacing loop condition with basic &quot;for&quot; loop can make it a bit faster, but that depends on compiler ability to optimize it to &quot;loop&quot; instruction. <img src="http://www.allegro.cc/forums/smileys/smiley.gif" alt=":)" /></p><p>The cool thing is that you can use the same approach with some other basic operations by just finding how to do the thing using few shifts, additions and other basic operations. <img src="http://www.allegro.cc/forums/smileys/smiley.gif" alt=":)" />
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (Krzysztof Kluczek)</author>
		<pubDate>Mon, 19 Mar 2007 14:16:12 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><p>Also unrolling it might give some boost. On Core2 based CPU&#39;s, using SSE would further give significant speed increase <img src="http://www.allegro.cc/forums/smileys/smiley.gif" alt=":)" />
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (HoHo)</author>
		<pubDate>Mon, 19 Mar 2007 14:39:08 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><div class="quote_container"><div class="title">Quote:</div><div class="quote"><p>
Also unrolling it might give some boost.
</p></div></div><p>
You can&#39;t really unroll entire loop as its length depends on bitmap width, but unrolling it a bit to make loop deal with four pixels in single iteration might be worth it. Unrolling it more won&#39;t make that much difference and will make loop code longer, which CPU might not like. <img src="http://www.allegro.cc/forums/smileys/smiley.gif" alt=":)" /></p><div class="quote_container"><div class="title">Quote:</div><div class="quote"><p>
On Core2 based CPU&#39;s, using SSE would further give significant speed increase <img src="http://www.allegro.cc/forums/smileys/smiley.gif" alt=":)" />
</p></div></div><p>
You should be able to do it with SSE2 (Pentium 4). <img src="http://www.allegro.cc/forums/smileys/smiley.gif" alt=":)" />
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (Krzysztof Kluczek)</author>
		<pubDate>Mon, 19 Mar 2007 15:12:49 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><div class="quote_container"><div class="title">Quote:</div><div class="quote"><p>
Unrolling it more won&#39;t make that much difference and will make loop code longer, which CPU might not like.
</p></div></div><p>This is true, especially in 32bit.
</p><div class="quote_container"><div class="title">Quote:</div><div class="quote"><p>
You should be able to do it with SSE2 (Pentium 4).
</p></div></div><p>Yes, but clock-to-clock Core2 has twice the SSE throughput of P4 and K8 <img src="http://www.allegro.cc/forums/smileys/wink.gif" alt=";)" /><br />On other CPU&#39;s using plain old MMX should give comparable results to SSE2.
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (HoHo)</author>
		<pubDate>Mon, 19 Mar 2007 16:32:09 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><p>Use allegrogl.
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (Milan Mimica)</author>
		<pubDate>Mon, 19 Mar 2007 17:46:49 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><p>Use allegrogl.
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (GullRaDriel)</author>
		<pubDate>Mon, 19 Mar 2007 18:04:35 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><p>FBlend supports subbitmaps, uses memory bitmaps correctly, and needs to do more checks for things like 15 vs 16 vs 32 bit. Other than that, you&#39;re probably bandwidth bound (and not compute bound), so MMX/SSE would not help.
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (Bob)</author>
		<pubDate>Mon, 19 Mar 2007 23:35:15 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><p>Just to make sure... Buffer is a memory bitmap, and not a video bitmap, right? Because doing any kind of blending operation on a video bitmap without the aid of, say, OpenGL is going to be very painful for your FPS.
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (X-G)</author>
		<pubDate>Tue, 20 Mar 2007 00:41:46 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><p>If you happen to use small bitmaps and they fit to cache you probably won&#39;t be that limited by bandwidth. 800x600@32bit takes around 2M. If you have a CPU with big cache it might be worth it to use more efficient SIMD instructions. Though when you already have a CPU with big cache it will probably be fast enough already <img src="http://www.allegro.cc/forums/smileys/tongue.gif" alt=":P" />
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (HoHo)</author>
		<pubDate>Tue, 20 Mar 2007 00:42:13 +0000</pubDate>
	</item>
</rss>
