<?xml version="1.0"?>
<rss version="2.0">
	<channel>
		<title>fastest bliting method</title>
		<link>http://www.allegro.cc/forums/view/431144</link>
		<description>Allegro.cc Forum Thread</description>
		<webMaster>matthew@allegro.cc (Matthew Leverton)</webMaster>
		<lastBuildDate>Fri, 19 Nov 2004 01:28:44 +0000</lastBuildDate>
	</channel>
	<item>
		<description><![CDATA[<div class="mockup v2"><p>what is the fast method?, 1)use iterative loops 2)memcpy 3)other.
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (lucaz)</author>
		<pubDate>Wed, 17 Nov 2004 20:39:49 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><p>Direct memory line access?  Depends really what depth you are running at, you can use block memory access in 8bpp, or 32-bit words at a time in 16-bpp, etc.
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (Steve Terry)</author>
		<pubDate>Wed, 17 Nov 2004 20:44:51 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><p>direct line access is the first option right?.<br />When Im mean iterative loops is:
</p><div class="source-code snippet"><div class="inner"><pre><span class="k1">for</span><span class="k2">(</span><span class="k1">int</span> x<span class="k3">=</span><span class="n">0</span><span class="k2">;</span> x<span class="k3">&lt;</span>bmp1-&gt;w <span class="k2">;</span>x<span class="k3">+</span><span class="k3">+</span><span class="k2">)</span>
 <span class="k1">for</span><span class="k2">(</span><span class="k1">int</span> y<span class="k3">=</span><span class="n">0</span><span class="k2">;</span> y<span class="k3">&lt;</span>bmp1-&gt;h <span class="k2">;</span>y<span class="k3">+</span><span class="k3">+</span><span class="k2">)</span>
  bmp1-&gt;line<span class="k2">[</span>x<span class="k2">]</span><span class="k2">[</span>y<span class="k2">]</span> <span class="k3">=</span> bmp2-&gt;line<span class="k2">[</span>x<span class="k2">]</span><span class="k2">[</span>y<span class="k2">]</span><span class="k2">;</span>
</pre></div></div><p>
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (lucaz)</author>
		<pubDate>Wed, 17 Nov 2004 20:52:00 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><p>Right...
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (Steve Terry)</author>
		<pubDate>Wed, 17 Nov 2004 20:53:47 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><p>3) MMX.</p><p>But then again, you didn&#39;t specify video/memory/system bitmaps and the destination, and also any blending. <img src="http://www.allegro.cc/forums/smileys/smiley.gif" alt=":)" />
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (Chris Katko)</author>
		<pubDate>Wed, 17 Nov 2004 20:57:33 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><p>this is the fastest method?, ¬_¬ ....
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (lucaz)</author>
		<pubDate>Wed, 17 Nov 2004 20:58:57 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><p>Depends on how much time you want to spend on your custom blitter <img src="http://www.allegro.cc/forums/smileys/grin.gif" alt=";D" />  MMX is great but it is kinda complex.
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (Steve Terry)</author>
		<pubDate>Wed, 17 Nov 2004 21:22:47 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><div class="quote_container"><div class="title">Quote:</div><div class="quote"><p>for(int x=0; x&lt;bmp1-&gt;w ;x++)<br />for(int y=0; y&lt;bmp1-&gt;h ;y++)<br />  bmp1-&gt;line[x][y] = bmp2-&gt;line[x][y];</p></div></div><p>
Bitmap line array works as line[y][x], but remember about different bits per pixel count in different formats. Also you should blit in rows, not in columns, to make better use of cache. Using memory pointers can help a bit too.</p><div class="source-code snippet"><div class="inner"><pre><span class="k1">int</span> y,<span class="k3">*</span>s,<span class="k3">*</span>d,<span class="k3">*</span>e<span class="k2">;</span>
<span class="k1">for</span><span class="k2">(</span>y<span class="k3">=</span><span class="n">0</span><span class="k2">;</span> y<span class="k3">&lt;</span>bmp1-&gt;h <span class="k2">;</span>y<span class="k3">+</span><span class="k3">+</span><span class="k2">)</span>
<span class="k2">{</span>
  s <span class="k3">=</span> bmp1-&gt;line<span class="k2">[</span>y<span class="k2">]</span><span class="k2">;</span>
  e <span class="k3">=</span> s <span class="k3">+</span> <span class="k2">(</span>bmp1-&gt;w<span class="k3">*</span>bpp<span class="k3">+</span><span class="n">3</span><span class="k2">)</span><span class="k3">/</span><span class="n">4</span><span class="k2">;</span>  <span class="c">// bpp = bytes per pixel</span>
  d <span class="k3">=</span> bmp2-&gt;line<span class="k2">[</span>y<span class="k2">]</span><span class="k2">;</span>
  <span class="k1">while</span><span class="k2">(</span>s<span class="k3">&lt;</span>e<span class="k2">)</span>
    <span class="k3">*</span>d<span class="k3">+</span><span class="k3">+</span> <span class="k3">=</span> <span class="k3">*</span>s<span class="k3">+</span><span class="k3">+</span><span class="k2">;</span>
<span class="k2">}</span>
</pre></div></div><p>
<img src="http://www.allegro.cc/forums/smileys/smiley.gif" alt=":)" /></p><p>And MMX is completely useless in blitting unless you are doing blending in 24/32bpp, since it just gives additional math operations and none are required here. <img src="http://www.allegro.cc/forums/smileys/smiley.gif" alt=":)" />
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (Krzysztof Kluczek)</author>
		<pubDate>Wed, 17 Nov 2004 21:55:15 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><p>are you sure this is the fastest method?, memcpy seems more useful, it dont iterate... just take a block of memory a copy it to another-
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (lucaz)</author>
		<pubDate>Wed, 17 Nov 2004 22:33:05 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><p>I&#39;m not saying memcpy wouldn&#39;t work however that will only allow you to copy a bitmap, not manipulate it or make a custom blitter, why not just use blit instead? <img src="http://www.allegro.cc/forums/smileys/tongue.gif" alt=":P" />
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (Steve Terry)</author>
		<pubDate>Wed, 17 Nov 2004 22:41:52 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><p>You can move 8 blocks of data using the MMX movq operators, instead of the  one block at each time with memcpy. Problems: target machine should be MMX enabled, the addresses should be aligned, and the array should be 4x multiple. I have the code at home, will check it out later.
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (ReyBrujo)</author>
		<pubDate>Wed, 17 Nov 2004 22:43:03 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><p>Just blitting straight to the screen is extremley fast (I can pull thousands of frames per second). Its putting that data into memory first, then to the screen thats slow <img src="http://www.allegro.cc/forums/smileys/tongue.gif" alt=":P" /> (ie double buffer)
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (Gnatinator)</author>
		<pubDate>Wed, 17 Nov 2004 22:45:35 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><p>Faster way will always be using DRS instead of updating the whole screen.
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (ReyBrujo)</author>
		<pubDate>Wed, 17 Nov 2004 22:50:02 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><p>of course memcpy iterates, how else would it move more than one piece of data?
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (Kris Allen)</author>
		<pubDate>Wed, 17 Nov 2004 23:03:32 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><p>I dont know its code. but at least it just does one for(), not 2
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (lucaz)</author>
		<pubDate>Wed, 17 Nov 2004 23:07:40 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><p>What exactly are you trying to accomplish, we may find the best method for what you want to implement.
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (Steve Terry)</author>
		<pubDate>Wed, 17 Nov 2004 23:16:03 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><p>Kris, There is a difference between manually iterating (using a loop and jump) and letting the processor iterate for you (like, in example, using <i>repnz movsb</i>):</p><div class="source-code"><div class="toolbar"></div><div class="inner"><table width="100%"><tbody><tr><td class="number">1</td><td><span class="k2">;</span>  <span class="k1">this</span> is looping</td></tr><tr><td class="number">2</td><td><span class="k1">xor</span> ecx, ecx</td></tr><tr><td class="number">3</td><td>mov esi, source_string</td></tr><tr><td class="number">4</td><td>mov edi, target_string</td></tr><tr><td class="number">5</td><td>begin_loop:</td></tr><tr><td class="number">6</td><td>    mov eax, <span class="k2">[</span>esi<span class="k3">+</span>ecx<span class="k2">]</span>   <span class="k2">;</span> fetch a byte from the source string</td></tr><tr><td class="number">7</td><td>    mov <span class="k2">[</span>edi<span class="k3">+</span>ecx<span class="k2">]</span>, eax   <span class="k2">;</span> put the byte in the target string</td></tr><tr><td class="number">8</td><td>    inc ecx</td></tr><tr><td class="number">9</td><td>    test ecx, string_length</td></tr><tr><td class="number">10</td><td>jnz begin_loop</td></tr><tr><td class="number">11</td><td>&#160;</td></tr><tr><td class="number">12</td><td><span class="k2">;</span>  <span class="k1">this</span> must be <a href="http://www.delorie.com/djgpp/doc/libc/libc_566.html" target="_blank">memcpy</a> way</td></tr><tr><td class="number">13</td><td>mov ecx, string_length</td></tr><tr><td class="number">14</td><td>mov esi, source_string</td></tr><tr><td class="number">15</td><td>mov edi, target_string</td></tr><tr><td class="number">16</td><td>repnz movsb              <span class="k2">;</span> keep repeating movbs <span class="k2">(</span>take a byte from</td></tr><tr><td class="number">17</td><td>                         <span class="k2">;</span> esi, put it in edi, decrease ecx<span class="k2">)</span> until</td></tr><tr><td class="number">18</td><td>                         <span class="k2">;</span> esi is <span class="n">0</span> <span class="k2">(</span>end of string<span class="k2">)</span> <span class="k1">or</span> ecx is <span class="n">0</span>.</td></tr></tbody></table></div></div><p>
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (ReyBrujo)</author>
		<pubDate>Wed, 17 Nov 2004 23:17:35 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><p>ah cool, didnt know about that, no wonder it&#39;s so fast :B
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (Kris Allen)</author>
		<pubDate>Wed, 17 Nov 2004 23:22:08 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><p>I&#39;m not sure, but I think that my method and any other which should be faster than it will be memory bus limited anyway.</p><p>And of course fastest blitting method is to use HW acceleration. <img src="http://www.allegro.cc/forums/smileys/smiley.gif" alt=":)" />
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (Krzysztof Kluczek)</author>
		<pubDate>Wed, 17 Nov 2004 23:28:44 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><p>Wouldnt bother writing your own blitting methods unless blending is applied tho, if even then.
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (decsonic)</author>
		<pubDate>Wed, 17 Nov 2004 23:52:44 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><p>
I really don&#39;t think it&#39;s worth spending the time trying to optimise things like this. Doing the rest of the program is more important.. <img src="http://www.allegro.cc/forums/smileys/smiley.gif" alt=":)" />
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (Richard Phipps)</author>
		<pubDate>Thu, 18 Nov 2004 00:26:15 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><p>who likes to optimise blit?
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (lucaz)</author>
		<pubDate>Thu, 18 Nov 2004 00:40:52 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><p>Well, first optimize your game fully. You don&#39;t try optimizing <i>printf</i>, you try optimizing your program so that it won&#39;t use that many first.
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (ReyBrujo)</author>
		<pubDate>Thu, 18 Nov 2004 00:44:46 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><p>
</p><div class="quote_container"><div class="title">Quote:</div><div class="quote"><p>who likes to optimise blit?</p></div></div><p>

Er.. your thread title is &#39;fastest blitting method&#39; <img src="http://www.allegro.cc/forums/smileys/huh.gif" alt="???" /></p><p>(And I didn&#39;t say blit, I said things like this..)
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (Richard Phipps)</author>
		<pubDate>Thu, 18 Nov 2004 00:56:15 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><p>read my sig <img src="http://www.allegro.cc/forums/smileys/grin.gif" alt=";D" />
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (decsonic)</author>
		<pubDate>Thu, 18 Nov 2004 01:07:57 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><p>Can someone explain to me what the hell is going on?  Optimize blit or optimize your code, if you need to optimize your code then it has nothing to do with blit, make your algorithms faster, not blit.
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (Steve Terry)</author>
		<pubDate>Thu, 18 Nov 2004 01:55:53 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><p>Fastest blitting method is, as KK said, your video card&#39;s video ram-&gt;video ram accelerated blit.</p><p>End of story!<br />And as far as actual time is concerned, the fastest blitting method is:
</p><div class="source-code snippet"><div class="inner"><pre><a href="http://www.allegro.cc/manual/blit" target="_blank"><span class="a">blit</span></a><span class="k2">(</span>something<span class="k2">)</span><span class="k2">;</span>
</pre></div></div><p>
because you don&#39;t start a thread like this, waste time debating things, and blit is only 4 letters! <img src="http://www.allegro.cc/forums/smileys/shocked.gif" alt=":o" />
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (Billybob)</author>
		<pubDate>Thu, 18 Nov 2004 02:28:15 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><div class="quote_container"><div class="title">1 said:</div><div class="quote"><p>
what is the fast method?, 1)use iterative loops 2)memcpy 3)other. 
</p></div></div><p>
</p><div class="quote_container"><div class="title">2 said:</div><div class="quote"><p>
direct line access is the first option right?.<br />When Im mean iterative loops is:<br />for(int x=0; x&lt;bmp1-&gt;w ;x++)for(int y=0; y&lt;bmp1-&gt;h ;y++)  bmp1-&gt;line[x][y] = bmp2-&gt;line[x][y];
</p></div></div><p>
</p><div class="quote_container"><div class="title">3 said:</div><div class="quote"><p>
is is the fastest method?, ¬_¬ .... 
</p></div></div><p>
</p><div class="quote_container"><div class="title">4 said:</div><div class="quote"><p>
are you sure this is the fastest method?, memcpy seems more useful, it dont iterate... just take a block of memory a copy it to another- 
</p></div></div><p>
</p><div class="quote_container"><div class="title">5 said:</div><div class="quote"><p>
I dont know its code. but at least it just does one for(), not 2 
</p></div></div><p>
</p><div class="quote_container"><div class="title">6 said:</div><div class="quote"><p>
who likes to optimise blit? 
</p></div></div><p>

Ive never said that Im trying to optimise.<br />I just like to know, like my topic says, what is the fastest.</p><p>ReyBrujo explained me why is better memcpy() than use loops, that was what I like to know.<br />Thanks!.
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (lucaz)</author>
		<pubDate>Thu, 18 Nov 2004 04:43:07 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><p>Here is the code I used to optimize. Note that, if the processor does not have MMX support, it just copied with <i>memcpy</i>. Of course, the bitmaps must be aligned, and you cannot use these with the screen bitmap. I tried it a couple of times with that old project (DRS system), and worked quite fine. I can tell you it won&#39;t crash unless you don&#39;t meet the requirements <img src="http://www.allegro.cc/forums/smileys/tongue.gif" alt=":P" /></p><div class="source-code"><div class="toolbar"></div><div class="inner"><table width="100%"><tbody><tr><td class="number">1</td><td><span class="c">//                                                                           //</span></td></tr><tr><td class="number">2</td><td><span class="c">//  The MMX optimization code is here. Hmm... I don't really know if this    //</span></td></tr><tr><td class="number">3</td><td><span class="c">//  should be public for everyone (I mean, as a header, and not as another   //</span></td></tr><tr><td class="number">4</td><td><span class="c">//  source file), but anyway, it is easier this way.                         //</span></td></tr><tr><td class="number">5</td><td><span class="c">//                                                                           //</span></td></tr><tr><td class="number">6</td><td><span class="c">//  -----------------------------------------------------------------------  //</span></td></tr><tr><td class="number">7</td><td><span class="c">//                                                                           //</span></td></tr><tr><td class="number">8</td><td><span class="c">//      This file is a part of DRS (alpha) package.                          //</span></td></tr><tr><td class="number">9</td><td><span class="c">//      Copyright (C) 2002  Roberto Alfonso (aka ReyBrujo)                   //</span></td></tr><tr><td class="number">10</td><td><span class="c">//                          reybrujo@hotmail.com                             //</span></td></tr><tr><td class="number">11</td><td><span class="c">//                                                                           //</span></td></tr><tr><td class="number">12</td><td><span class="c">//      This package is free software; you can redistribute it and/or        //</span></td></tr><tr><td class="number">13</td><td><span class="c">//      modify it under the terms of the GNU General Public License as       //</span></td></tr><tr><td class="number">14</td><td><span class="c">//      published by the Free Software Foundation; either version 2,         //</span></td></tr><tr><td class="number">15</td><td><span class="c">//      or (at your option) any later version.                               //</span></td></tr><tr><td class="number">16</td><td><span class="c">//                                                                           //</span></td></tr><tr><td class="number">17</td><td><span class="c">//      This package is distributed in the hope that it will be useful,      //</span></td></tr><tr><td class="number">18</td><td><span class="c">//      but WITHOUT ANY WARRANTY; without even the implied warranty of       //</span></td></tr><tr><td class="number">19</td><td><span class="c">//      MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the         //</span></td></tr><tr><td class="number">20</td><td><span class="c">//      GNU General Public License for more details.                         //</span></td></tr><tr><td class="number">21</td><td><span class="c">//                                                                           //</span></td></tr><tr><td class="number">22</td><td><span class="c">//      You should have received a copy of the GNU General Public            //</span></td></tr><tr><td class="number">23</td><td><span class="c">//      License along with this package (see the file COPYING). If not,      //</span></td></tr><tr><td class="number">24</td><td><span class="c">//      write to the Free Software Foundation, Inc., 59 Temple Place,        //</span></td></tr><tr><td class="number">25</td><td><span class="c">//      Suite 330, Bostom, MA  02111-1307  USA                               //</span></td></tr><tr><td class="number">26</td><td><span class="c">//                                                                           //</span></td></tr><tr><td class="number">27</td><td><span class="c">//  -----------------------------------------------------------------------  //</span></td></tr><tr><td class="number">28</td><td><span class="c">//                                                                           //</span></td></tr><tr><td class="number">29</td><td><span class="p">#ifndef _MMXCODE_H_INCLUDED</span></td></tr><tr><td class="number">30</td><td><span class="p">#define _MMXCODE_H_INCLUDED 0xDEAD</span></td></tr><tr><td class="number">31</td><td>&#160;</td></tr><tr><td class="number">32</td><td>&#160;</td></tr><tr><td class="number">33</td><td>&#160;</td></tr><tr><td class="number">34</td><td><span class="p">#ifdef __cplusplus</span></td></tr><tr><td class="number">35</td><td><span class="k1">extern</span> <span class="s">"C"</span> <span class="k2">{</span></td></tr><tr><td class="number">36</td><td><span class="p">#endif</span></td></tr><tr><td class="number">37</td><td>&#160;</td></tr><tr><td class="number">38</td><td><span class="c">//                                                                           //</span></td></tr><tr><td class="number">39</td><td><span class="c">//  Whenever I needed to update the internal bitmap I just used blit(). But  //</span></td></tr><tr><td class="number">40</td><td><span class="c">//  I noticed it was very slow, so I tried just (since the background and    //</span></td></tr><tr><td class="number">41</td><td><span class="c">//  the internal bitmap have the same size) memcpy() the 'line' pointers of  //</span></td></tr><tr><td class="number">42</td><td><span class="c">//  the bitmap struct. But that gave some boost, but not enough.             //</span></td></tr><tr><td class="number">43</td><td><span class="c">//                                                                           //</span></td></tr><tr><td class="number">44</td><td><span class="c">//  Now, and since it was still slow, I decided to try MMX. I defined a new  //</span></td></tr><tr><td class="number">45</td><td><span class="c">//  macro, MMX_MEMCPY, which checks if your hardware support MMX set (by     //</span></td></tr><tr><td class="number">46</td><td><span class="c">//  checking Allegro cpu_capabilities global variable). If so, it uses the   //</span></td></tr><tr><td class="number">47</td><td><span class="c">//  movq instruction to copy 32 bytes each cycle.                            //</span></td></tr><tr><td class="number">48</td><td><span class="c">//                                                                           //</span></td></tr><tr><td class="number">49</td><td><span class="c">//  WARNING!                                                                 //</span></td></tr><tr><td class="number">50</td><td><span class="c">//  Though it increases the frame rate (with 25 objects and seed 55555555    //</span></td></tr><tr><td class="number">51</td><td><span class="c">//  increases +30FPS here), be careful! This is still a hack, and I did not  //</span></td></tr><tr><td class="number">52</td><td><span class="c">//  even care about aligning. The code expects to find a '_size' multiple    //</span></td></tr><tr><td class="number">53</td><td><span class="c">//  of 32 (like 640x480, 320x200, etc, etc, etc). But with odd sizes (maybe  //</span></td></tr><tr><td class="number">54</td><td><span class="c">//  you have set a bitmap of 319x111, which is not multiple of 32), there    //</span></td></tr><tr><td class="number">55</td><td><span class="c">//  will be some bytes that are not going to be copied.                      //</span></td></tr><tr><td class="number">56</td><td><span class="c">//                                                                           //</span></td></tr><tr><td class="number">57</td><td><span class="c">//  Why I just copy 32 each cycle and not 8x8 = 64 bytes each cycle? Well,   //</span></td></tr><tr><td class="number">58</td><td><span class="c">//  the redirection from memory (first you copy from [esi], then [esi+8],    //</span></td></tr><tr><td class="number">59</td><td><span class="c">//  then [esi+16], etc) takes some time, and will drain all advantage we     //</span></td></tr><tr><td class="number">60</td><td><span class="c">//  get by copying several bytes at once. According to my tests, 32 bytes    //</span></td></tr><tr><td class="number">61</td><td><span class="c">//  each cycle gives a good speed.                                           //</span></td></tr><tr><td class="number">62</td><td><span class="c">//                                                                           //</span></td></tr><tr><td class="number">63</td><td><span class="c">//  Since the object creates the internal and background bitmaps taking the  //</span></td></tr><tr><td class="number">64</td><td><span class="c">//  screen width and size, and since I haven't seen any odd resolution, the  //</span></td></tr><tr><td class="number">65</td><td><span class="c">//  object itself shouldn't have problems at all. But if you take this code  //</span></td></tr><tr><td class="number">66</td><td><span class="c">//  to implement your own fast_copy_bitmap() function, be warned: you need   //</span></td></tr><tr><td class="number">67</td><td><span class="c">//  to align the data and manually copy the bytes that are not copied using  //</span></td></tr><tr><td class="number">68</td><td><span class="c">//  the cycle.                                                               //</span></td></tr><tr><td class="number">69</td><td><span class="c">//                                                                           //</span></td></tr><tr><td class="number">70</td><td>&#160;</td></tr><tr><td class="number">71</td><td>&#160;</td></tr><tr><td class="number">72</td><td>&#160;</td></tr><tr><td class="number">73</td><td><span class="c">//                                                                           //</span></td></tr><tr><td class="number">74</td><td><span class="c">//  The user should not include this header directly. It is for his safety,  //</span></td></tr><tr><td class="number">75</td><td><span class="c">//  I don't really care if he deletes the #error directive and hack this by  //</span></td></tr><tr><td class="number">76</td><td><span class="c">//  him/herself.                                                             //</span></td></tr><tr><td class="number">77</td><td><span class="c">//                                                                           //</span></td></tr><tr><td class="number">78</td><td><span class="p">#ifndef _DRS_H_INCLUDED</span></td></tr><tr><td class="number">79</td><td>    <span class="p">#error You should not include this file directly!</span></td></tr><tr><td class="number">80</td><td><span class="p">#endif</span></td></tr><tr><td class="number">81</td><td>&#160;</td></tr><tr><td class="number">82</td><td>&#160;</td></tr><tr><td class="number">83</td><td>&#160;</td></tr><tr><td class="number">84</td><td><span class="p">#ifdef USE_MMX</span></td></tr><tr><td class="number">85</td><td><span class="p">#ifdef __GNUC__</span></td></tr><tr><td class="number">86</td><td>    <span class="c">//</span></td></tr><tr><td class="number">87</td><td>    <span class="c">//  MMX code for DJGPP, MingW32 and probably Linux. Sorry, but cannot test</span></td></tr><tr><td class="number">88</td><td>    <span class="c">//  Linux version for now until getting a new harddisk  :(</span></td></tr><tr><td class="number">89</td><td>    <span class="c">//</span></td></tr><tr><td class="number">90</td><td>&#160;</td></tr><tr><td class="number">91</td><td>    <span class="p">#define MMX_MEMCPY(_t, _s, _size)             \</span></td></tr><tr><td class="number">92</td><td><span class="p">        if (cpu_capabilities &amp; CPU_MMX) {         \ </span></td></tr><tr><td class="number">93</td><td><span class="p">            asm(                                  \ </span></td></tr><tr><td class="number">94</td><td><span class="p">                "0:                    \n\t"      \ </span></td></tr><tr><td class="number">95</td><td><span class="p">                "movq   (%%esi), %%mm0 \n\t"      \ </span></td></tr><tr><td class="number">96</td><td><span class="p">                "movq  8(%%esi), %%mm1 \n\t"      \ </span></td></tr><tr><td class="number">97</td><td><span class="p">                "movq 16(%%esi), %%mm2 \n\t"      \ </span></td></tr><tr><td class="number">98</td><td><span class="p">                "movq 24(%%esi), %%mm3 \n\t"      \ </span></td></tr><tr><td class="number">99</td><td><span class="p">                "movq %%mm0,   (%%edi) \n\t"      \ </span></td></tr><tr><td class="number">100</td><td><span class="p">                "movq %%mm1,  8(%%edi) \n\t"      \ </span></td></tr><tr><td class="number">101</td><td><span class="p">                "movq %%mm2, 16(%%edi) \n\t"      \ </span></td></tr><tr><td class="number">102</td><td><span class="p">                "movq %%mm3, 24(%%edi) \n\t"      \ </span></td></tr><tr><td class="number">103</td><td><span class="p">                "addl $32, %%esi       \n\t"      \ </span></td></tr><tr><td class="number">104</td><td><span class="p">                "addl $32, %%edi       \n\t"      \ </span></td></tr><tr><td class="number">105</td><td><span class="p">                "decl %%ecx            \n\t"      \ </span></td></tr><tr><td class="number">106</td><td><span class="p">                "jnz  0b               \n\t"      \ </span></td></tr><tr><td class="number">107</td><td><span class="p">                "emms                  \n\t"      \ </span></td></tr><tr><td class="number">108</td><td><span class="p">                : : "c" ((_size) &gt;&gt; 5),           \ </span></td></tr><tr><td class="number">109</td><td><span class="p">                    "S" (_s-&gt;line[0]),            \ </span></td></tr><tr><td class="number">110</td><td><span class="p">                    "D" (_t-&gt;line[0])             \ </span></td></tr><tr><td class="number">111</td><td><span class="p">            );                                    \ </span></td></tr><tr><td class="number">112</td><td><span class="p">        }                                         \ </span></td></tr><tr><td class="number">113</td><td><span class="p">        else                                      \ </span></td></tr><tr><td class="number">114</td><td><span class="p">            memcpy(_t-&gt;line[0], _s-&gt;line[0], (_size)) </span></td></tr><tr><td class="number">115</td><td><span class="p">#else // !__GNUC__</span></td></tr><tr><td class="number">116</td><td>&#160;</td></tr><tr><td class="number">117</td><td>    <span class="c">//</span></td></tr><tr><td class="number">118</td><td>    <span class="c">// MMX code for MSVC and, probably, BCC. MSVC doesn't understand the code</span></td></tr><tr><td class="number">119</td><td>    <span class="c">// as a macro, so I set it as an inline function.</span></td></tr><tr><td class="number">120</td><td>    <span class="c">//</span></td></tr><tr><td class="number">121</td><td>    <span class="k1">inline</span> <span class="k1">void</span> mmx_memcpy<span class="k2">(</span><span class="k1">unsigned</span> <span class="k1">char</span> <span class="k3">*</span>target,</td></tr><tr><td class="number">122</td><td>                           <span class="k1">unsigned</span> <span class="k1">char</span> <span class="k3">*</span>source, <span class="k1">long</span> amount<span class="k2">)</span> <span class="k2">{</span></td></tr><tr><td class="number">123</td><td>        <span class="k1">if</span> <span class="k2">(</span><a href="http://www.allegro.cc/manual/cpu_capabilities" target="_blank"><span class="a">cpu_capabilities</span></a> <span class="k3">&amp;</span> CPU_MMX<span class="k2">)</span> <span class="k2">{</span></td></tr><tr><td class="number">124</td><td>            __asm <span class="k2">{</span></td></tr><tr><td class="number">125</td><td>                mov  ecx, amount</td></tr><tr><td class="number">126</td><td>                mov  esi, source</td></tr><tr><td class="number">127</td><td>                mov  edi, target</td></tr><tr><td class="number">128</td><td>                again:</td></tr><tr><td class="number">129</td><td>                movq mm0, <span class="k2">[</span>esi   <span class="k2">]</span></td></tr><tr><td class="number">130</td><td>                movq mm1, <span class="k2">[</span>esi<span class="k3">+</span> <span class="n">8</span><span class="k2">]</span></td></tr><tr><td class="number">131</td><td>                movq mm2, <span class="k2">[</span>esi<span class="k3">+</span><span class="n">16</span><span class="k2">]</span></td></tr><tr><td class="number">132</td><td>                movq mm3, <span class="k2">[</span>esi<span class="k3">+</span><span class="n">24</span><span class="k2">]</span></td></tr><tr><td class="number">133</td><td>                movq <span class="k2">[</span>edi   <span class="k2">]</span>, mm0</td></tr><tr><td class="number">134</td><td>                movq <span class="k2">[</span>edi<span class="k3">+</span> <span class="n">8</span><span class="k2">]</span>, mm1</td></tr><tr><td class="number">135</td><td>                movq <span class="k2">[</span>edi<span class="k3">+</span><span class="n">16</span><span class="k2">]</span>, mm2</td></tr><tr><td class="number">136</td><td>                movq <span class="k2">[</span>edi<span class="k3">+</span><span class="n">24</span><span class="k2">]</span>, mm3</td></tr><tr><td class="number">137</td><td>                add  esi, <span class="n">32</span></td></tr><tr><td class="number">138</td><td>                add  edi, <span class="n">32</span></td></tr><tr><td class="number">139</td><td>                dec  ecx</td></tr><tr><td class="number">140</td><td>                jnz  again</td></tr><tr><td class="number">141</td><td>                emms</td></tr><tr><td class="number">142</td><td>            <span class="k2">}</span></td></tr><tr><td class="number">143</td><td>        <span class="k2">}</span></td></tr><tr><td class="number">144</td><td>        <span class="k1">else</span></td></tr><tr><td class="number">145</td><td>            <a href="http://www.delorie.com/djgpp/doc/libc/libc_566.html" target="_blank">memcpy</a><span class="k2">(</span>target, source, amount<span class="k2">)</span><span class="k2">;</span></td></tr><tr><td class="number">146</td><td>    <span class="k2">}</span></td></tr><tr><td class="number">147</td><td>&#160;</td></tr><tr><td class="number">148</td><td>    <span class="p">#define MMX_MEMCPY(_t, _s, _sz)   \</span></td></tr><tr><td class="number">149</td><td><span class="p">        mmx_memcpy(_t-&gt;line[0], _s-&gt;line[0], (_sz) &gt;&gt; 5) </span></td></tr><tr><td class="number">150</td><td><span class="p">#endif</span></td></tr><tr><td class="number">151</td><td><span class="p">#else // ! USE_MMX</span></td></tr><tr><td class="number">152</td><td>    <span class="p">#define MMX_MEMCPY(_t, _s, _sz)   \</span></td></tr><tr><td class="number">153</td><td><span class="p">        memcpy(_t-&gt;line[0], _s-&gt;line[0], (_sz)) </span></td></tr><tr><td class="number">154</td><td><span class="p">#endif</span></td></tr><tr><td class="number">155</td><td>&#160;</td></tr><tr><td class="number">156</td><td>&#160;</td></tr><tr><td class="number">157</td><td><span class="p">#ifdef __cplusplus</span></td></tr><tr><td class="number">158</td><td><span class="k2">}</span></td></tr><tr><td class="number">159</td><td><span class="p">#endif</span></td></tr><tr><td class="number">160</td><td>&#160;</td></tr><tr><td class="number">161</td><td>&#160;</td></tr><tr><td class="number">162</td><td>&#160;</td></tr><tr><td class="number">163</td><td><span class="p">#endif // _MMXCODE_H_INCLUDED</span></td></tr></tbody></table></div></div><p>
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (ReyBrujo)</author>
		<pubDate>Thu, 18 Nov 2004 06:35:17 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><p>thanks one more time reybrujo!.<br />Im not trying to optimise, my idea is try to make my own blit, so I can use it in a machine without allegro.
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (lucaz)</author>
		<pubDate>Thu, 18 Nov 2004 08:36:48 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><p>I agree with Richard, make your game first, make it run faster later.<br />But of course, all of us want to own the faster blitter ever made. <img src="http://www.allegro.cc/forums/smileys/cool.gif" alt="8-)" />
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (Paul whoknows)</author>
		<pubDate>Thu, 18 Nov 2004 10:20:23 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><p><img src="http://www.allegro.cc/forums/smileys/shocked.gif" alt=":o" /> people you are crazy!, Im not trying to optimise!!!!!!!!, ahhhhhhhhhhhhhhhhhhhhhhhhh
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (lucaz)</author>
		<pubDate>Fri, 19 Nov 2004 00:27:21 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><p>Then say that you are writing your own blitter, it makes more sense now.  Since you are not using allegro I&#39;m not sure what you are using, if it&#39;s good ol mode 13h then memcpy would work best since you probably have your bitmap stored linear anyway as well as the screen, just beware the screen &quot;wraps&quot; around <img src="http://www.allegro.cc/forums/smileys/smiley.gif" alt=":)" />
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (Steve Terry)</author>
		<pubDate>Fri, 19 Nov 2004 01:28:44 +0000</pubDate>
	</item>
</rss>
