<?xml version="1.0"?>
<rss version="2.0">
	<channel>
		<title>&#39;Fixed point&#39;</title>
		<link>http://www.allegro.cc/forums/view/488085</link>
		<description>Allegro.cc Forum Thread</description>
		<webMaster>matthew@allegro.cc (Matthew Leverton)</webMaster>
		<lastBuildDate>Tue, 17 May 2005 23:26:23 +0000</lastBuildDate>
	</channel>
	<item>
		<description><![CDATA[<div class="mockup v2"><p>Having a poke around my code in OS X I had a glance into the Allegro &#39;fixed point&#39; routines. I observed this is the active fixmul for my platform:</p><p>AL_INLINE(fixed, fixmul, (fixed x, fixed y),<br />{<br />   return ftofix(fixtof(x) * fixtof(y));<br />}</p><p>This is horrendously bad! Float/int conversions are even slower on PowerPC than intel because the two units of the processor are conceptually separated and a transfer through memory must occur. Would the following not be a smarter fallback for machines without an assembly implementation:</p><div class="source-code snippet"><div class="inner"><pre><span class="k1">unsigned</span> <span class="k1">int</span> sign <span class="k3">=</span> <span class="k2">(</span>r.value^value<span class="k2">)</span><span class="k3">&amp;</span><span class="n">0x80000000</span><span class="k2">;</span>
Sint32 Multiplicand <span class="k3">=</span> value<span class="k2">;</span>

<span class="k1">if</span><span class="k2">(</span>Multiplicand <span class="k3">&lt;</span> <span class="n">0</span><span class="k2">)</span> Multiplicand <span class="k3">=</span> <span class="k3">-</span>Multiplicand<span class="k2">;</span>
<span class="k1">if</span><span class="k2">(</span>r.value <span class="k3">&lt;</span> <span class="n">0</span><span class="k2">)</span> r.value <span class="k3">=</span> <span class="k3">-</span>r.value<span class="k2">;</span>

Fixed Newval<span class="k2">;</span>
Newval.value <span class="k3">=</span> 
  <span class="k2">(</span>Multiplicand <span class="k3">&gt;</span><span class="k3">&gt;</span> <span class="n">8</span><span class="k2">)</span><span class="k3">*</span><span class="k2">(</span>r.value <span class="k3">&gt;</span><span class="k3">&gt;</span> <span class="n">8</span><span class="k2">)</span> <span class="k3">+</span>
  <span class="k2">(</span><span class="k2">(</span><span class="k2">(</span>Multiplicand <span class="k3">&gt;</span><span class="k3">&gt;</span> <span class="n">8</span><span class="k2">)</span><span class="k3">*</span><span class="k2">(</span>r.value<span class="k3">&amp;</span><span class="n">0xff</span><span class="k2">)</span><span class="k2">)</span> <span class="k3">&gt;</span><span class="k3">&gt;</span> <span class="n">8</span><span class="k2">)</span> <span class="k3">+</span>
  <span class="k2">(</span><span class="k2">(</span><span class="k2">(</span>r.value <span class="k3">&gt;</span><span class="k3">&gt;</span> <span class="n">8</span><span class="k2">)</span><span class="k3">*</span><span class="k2">(</span>Multiplicand<span class="k3">&amp;</span><span class="n">0xff</span><span class="k2">)</span><span class="k2">)</span> <span class="k3">&gt;</span><span class="k3">&gt;</span> <span class="n">8</span><span class="k2">)</span><span class="k2">;</span>

<span class="k1">if</span><span class="k2">(</span>sign<span class="k2">)</span> Newval.value <span class="k3">=</span> <span class="k3">-</span>Newval.value<span class="k2">;</span>

<span class="k1">return</span> Newval<span class="k2">;</span>
</pre></div></div><p>

Or some &quot;smarter about signs and shifts&quot; variation? And while I&#39;m here posting, could there not be some compile time way of determining whether real fixed point or nasty triple cast fixed point implementations of things like fixmul are available?
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (Thomas Harte)</author>
		<pubDate>Sun, 08 May 2005 17:15:48 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><p>If your implementation benchmarks and tests well then of course we would accept it.  I don&#39;t know if anyone will have time before the next beta (on Friday).</p><div class="quote_container"><div class="title">Quote:</div><div class="quote"><p>
And while I&#39;m here posting, could there not be some compile time way of determining whether real fixed point or nasty triple cast fixed point implementations of things like fixmul are available?
</p></div></div><p>

I don&#39;t think so.</p><p>EDIT: Ok, I decided to do some quick benchmarks before going to sleep.  The patch I&#39;m using is attached.  The test machine is a Pentium-4 2.4GHz, running Linux 2.6.10, gcc 3.3.4.  I&#39;m using the &quot;Misc | Time some stuff&quot; function from `tests/test&#39;.  Here are the rather surprising results:</p><div class="source-code"><div class="toolbar"></div><div class="inner"><table width="100%"><tbody><tr><td class="number">1</td><td>Standard mode  <span class="k2">(</span>fixmuls per second<span class="k2">)</span></td></tr><tr><td class="number">2</td><td>&#160;</td></tr><tr><td class="number">3</td><td>asm:            <span class="n">25886122</span>, <span class="n">26106294</span></td></tr><tr><td class="number">4</td><td>current C:      <span class="n">29301026</span>, <span class="n">29226952</span></td></tr><tr><td class="number">5</td><td>patched C:      <span class="n">31416471</span>, <span class="n">31518915</span></td></tr><tr><td class="number">6</td><td>&#160;</td></tr><tr><td class="number">7</td><td><span class="k3">-</span><span class="k3">-</span><span class="k3">-</span><span class="k3">-</span><span class="k3">-</span><span class="k3">-</span><span class="k3">-</span><span class="k3">-</span><span class="k3">-</span><span class="k3">-</span><span class="k3">-</span><span class="k3">-</span><span class="k3">-</span><span class="k3">-</span><span class="k3">-</span><span class="k3">-</span><span class="k3">-</span><span class="k3">-</span><span class="k3">-</span><span class="k3">-</span><span class="k3">-</span><span class="k3">-</span><span class="k3">-</span><span class="k3">-</span><span class="k3">-</span><span class="k3">-</span><span class="k3">-</span><span class="k3">-</span><span class="k3">-</span><span class="k3">-</span><span class="k3">-</span><span class="k3">-</span><span class="k3">-</span><span class="k3">-</span><span class="k3">-</span><span class="k3">-</span><span class="k3">-</span><span class="k3">-</span><span class="k3">-</span><span class="k3">-</span><span class="k3">-</span><span class="k3">-</span><span class="k3">-</span><span class="k3">-</span><span class="k3">-</span><span class="k3">-</span><span class="k3">-</span><span class="k3">-</span><span class="k3">-</span><span class="k3">-</span><span class="k3">-</span><span class="k3">-</span><span class="k3">-</span><span class="k3">-</span><span class="k3">-</span><span class="k3">-</span><span class="k3">-</span><span class="k3">-</span><span class="k3">-</span><span class="k3">-</span><span class="k3">-</span><span class="k3">-</span><span class="k3">-</span><span class="k3">-</span><span class="k3">-</span><span class="k3">-</span><span class="k3">-</span><span class="k3">-</span><span class="k3">-</span><span class="k3">-</span></td></tr><tr><td class="number">8</td><td>&#160;</td></tr><tr><td class="number">9</td><td>Debug mode</td></tr><tr><td class="number">10</td><td>&#160;</td></tr><tr><td class="number">11</td><td>asm:            <span class="n">22182461</span>, <span class="n">22209498</span></td></tr><tr><td class="number">12</td><td>current C:       <span class="n">9916082</span>,  <span class="n">9649226</span></td></tr><tr><td class="number">13</td><td>patched C:      <span class="n">21665953</span>, <span class="n">21693887</span></td></tr><tr><td class="number">14</td><td>&#160;</td></tr><tr><td class="number">15</td><td>&#160;</td></tr><tr><td class="number">16</td><td><span class="k3">=</span><span class="k3">=</span><span class="k3">=</span><span class="k3">=</span><span class="k3">=</span><span class="k3">=</span><span class="k3">=</span><span class="k3">=</span><span class="k3">=</span><span class="k3">=</span><span class="k3">=</span><span class="k3">=</span><span class="k3">=</span><span class="k3">=</span><span class="k3">=</span><span class="k3">=</span><span class="k3">=</span><span class="k3">=</span><span class="k3">=</span><span class="k3">=</span><span class="k3">=</span><span class="k3">=</span><span class="k3">=</span><span class="k3">=</span><span class="k3">=</span><span class="k3">=</span><span class="k3">=</span><span class="k3">=</span><span class="k3">=</span><span class="k3">=</span><span class="k3">=</span><span class="k3">=</span><span class="k3">=</span><span class="k3">=</span><span class="k3">=</span><span class="k3">=</span><span class="k3">=</span><span class="k3">=</span><span class="k3">=</span><span class="k3">=</span><span class="k3">=</span><span class="k3">=</span><span class="k3">=</span><span class="k3">=</span><span class="k3">=</span><span class="k3">=</span><span class="k3">=</span><span class="k3">=</span><span class="k3">=</span><span class="k3">=</span><span class="k3">=</span><span class="k3">=</span><span class="k3">=</span><span class="k3">=</span><span class="k3">=</span><span class="k3">=</span><span class="k3">=</span><span class="k3">=</span><span class="k3">=</span><span class="k3">=</span><span class="k3">=</span><span class="k3">=</span><span class="k3">=</span><span class="k3">=</span><span class="k3">=</span><span class="k3">=</span><span class="k3">=</span><span class="k3">=</span><span class="k3">=</span><span class="k3">=</span></td></tr><tr><td class="number">17</td><td>&#160;</td></tr><tr><td class="number">18</td><td><span class="n">2005</span><span class="k3">-</span><span class="n">05</span><span class="k3">-</span><span class="n">17</span></td></tr><tr><td class="number">19</td><td>&#160;</td></tr><tr><td class="number">20</td><td>Standard mode                   <a href="http://www.allegro.cc/manual/fixmul" target="_blank"><span class="a">fixmul</span></a><span class="k3">/</span>sec              <a href="http://www.allegro.cc/manual/fixdiv" target="_blank"><span class="a">fixdiv</span></a><span class="k3">/</span>sec</td></tr><tr><td class="number">21</td><td>&#160;</td></tr><tr><td class="number">22</td><td><span class="k1">asm</span>                             <span class="n">29845165</span>, <span class="n">29802881</span>      <span class="n">24061238</span>, <span class="n">23802441</span></td></tr><tr><td class="number">23</td><td>unpatched C                     <span class="n">29462936</span>, <span class="n">29477438</span>      <span class="n">15233914</span>, <span class="n">15289430</span></td></tr><tr><td class="number">24</td><td>patched C <span class="k2">(</span><span class="n">32</span><span class="k3">-</span>bit <a href="http://www.allegro.cc/manual/fixmul" target="_blank"><span class="a">fixmul</span></a><span class="k2">)</span>       <span class="n">31846917</span>, <span class="n">31742140</span>      <span class="n">18414104</span>, <span class="n">18262372</span></td></tr><tr><td class="number">25</td><td>patched C <span class="k2">(</span><span class="n">64</span><span class="k3">-</span>bit <a href="http://www.allegro.cc/manual/fixmul" target="_blank"><span class="a">fixmul</span></a><span class="k2">)</span>       <span class="n">27817735</span>, <span class="n">27833801</span>      <span class="n">18468870</span>, <span class="n">18479899</span></td></tr></tbody></table></div></div><p>

I didn&#39;t really check that the patched version gives the same results, but the 3d and sprite rotate examples seemed to run ok.</p><p>EDIT: 2005-05-17 added results for Evert&#39;s patch that uses LONG_LONG where possible.  There seems to be a big discrepancy between the two set of asm results, so I don&#39;t know what happened there.  The C results seem to match up though.  The two &quot;patched C&quot; rows use the same implementation of fixdiv (making use of LONG_LONG).
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (Peter Wang)</author>
		<pubDate>Sun, 08 May 2005 18:45:50 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><div class="quote_container"><div class="title">Quote:</div><div class="quote"><p>I didn&#39;t really check that the patched version gives the same results, but the 3d and sprite rotate examples seemed to run ok.</p></div></div><p>
It&#39;s unlikely to give exactly the same results as the three cast method, but then so is the ia86 32x32-&gt;64 integer multiply. I&#39;d be surprised if the average divergance in results was substantially different though.
</p><div class="quote_container"><div class="title">Quote:</div><div class="quote"><p>Here are the rather surprising results:</p></div></div><p>
So in standard mode the patch is fastest, in debug the asm is fastest and the three cast method mysteriously overtakes the asm in standard mode? That is a little strange.</p><p>Perhaps someone smarter than me can eliminate the sign stuff in my method. The problem with signed numbers is the &amp;0xff&#39;s throw away the sign. If that could be avoided then all the conditionals could be removed and things should be even better.</p><p>Possibly there is some similar method for fixdiv too...
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (Thomas Harte)</author>
		<pubDate>Sun, 08 May 2005 20:07:22 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><p>I remember reading that back in the 68000 days Macs had a div and mult opcode; powerpc doesn&#39;t?  Mips risc does...</p><p>I&#39;m not clear on why shifts and &amp;&#39;s are needed; if you multiply two 32 bit values the result will be 64 bits; this result requires an arithmetic left shift of 16 (for 16.16 format) and truncation to 32 bits; the sign should be preserved.  For division again only arithmetic shifts are required preserving the sign.  And it doesn&#39;t have to be 16.16 either....</p><p>edit; heres some code;</p><div class="source-code"><div class="toolbar"></div><div class="inner"><table width="100%"><tbody><tr><td class="number">1</td><td><span class="k1">int</span> mul<span class="k2">(</span><span class="k1">int</span> a, <span class="k1">int</span> b<span class="k2">)</span> <span class="k2">{</span></td></tr><tr><td class="number">2</td><td>  <span class="k1">long</span> t<span class="k2">;</span></td></tr><tr><td class="number">3</td><td>  t <span class="k3">=</span> a <span class="k3">*</span> b<span class="k2">;</span></td></tr><tr><td class="number">4</td><td>  <span class="k1">return</span> <span class="k2">(</span><span class="k1">int</span><span class="k2">)</span>t&gt;&gt;16<span class="k2">;</span></td></tr><tr><td class="number">5</td><td><span class="k2">}</span></td></tr><tr><td class="number">6</td><td>&#160;</td></tr><tr><td class="number">7</td><td>&#160;</td></tr><tr><td class="number">8</td><td>&#160;</td></tr><tr><td class="number">9</td><td><span class="k1">int</span> <a href="http://www.delorie.com/djgpp/doc/libc/libc_134.html" target="_blank">div</a><span class="k2">(</span><span class="k1">int</span> a, <span class="k1">int</span> b<span class="k2">)</span> <span class="k2">{</span></td></tr><tr><td class="number">10</td><td>  <span class="k1">long</span> t<span class="k2">;</span></td></tr><tr><td class="number">11</td><td>  t <span class="k3">=</span> a<span class="k2">;</span></td></tr><tr><td class="number">12</td><td>  t <span class="k3">=</span> t<span class="k3">&lt;</span><span class="k3">&lt;</span><span class="n">16</span><span class="k2">;</span></td></tr><tr><td class="number">13</td><td>  t <span class="k3">=</span> t<span class="k3">/</span>b<span class="k2">;</span></td></tr><tr><td class="number">14</td><td>  t <span class="k3">=</span> t&gt;&gt;16<span class="k2">;</span></td></tr><tr><td class="number">15</td><td>        <span class="k1">return</span> <span class="k2">(</span><span class="k1">int</span><span class="k2">)</span>t<span class="k2">;</span></td></tr><tr><td class="number">16</td><td><span class="k2">}</span></td></tr></tbody></table></div></div><p>

edit2; hmm, looks like you&#39;ll lose the sign of a in the div() function...
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (nonnus29)</author>
		<pubDate>Sun, 08 May 2005 20:22:52 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><p>remember that long is only 32bits in c</p><p>Marcello
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (Marcello)</author>
		<pubDate>Sun, 08 May 2005 21:19:38 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><p>will this work on 64bits?</p><p>BTW, I have a 64bit gentoo compile, I can test stuff for you guys.
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (BAF)</author>
		<pubDate>Sun, 08 May 2005 21:31:08 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><div class="quote_container"><div class="title">Marcello said:</div><div class="quote"><p>remember that long is only 32bits in c
</p></div></div><p>
Depends on your platform and compiler. It&#39;s 64 bit on my AMD machine, and possibly on 64 bit Macs as well.<br />Not sure if that helps though...
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (Evert)</author>
		<pubDate>Sun, 08 May 2005 21:35:34 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><div class="quote_container"><div class="title">Marcello said:</div><div class="quote"><p>remember that long is only 32bits in c</p></div></div><p>
Actually, it&#39;s &#8805; 32 bits, &#8805; int...
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (gnolam)</author>
		<pubDate>Sun, 08 May 2005 21:36:19 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><p>right, point being, don&#39;t use it if you want 64bits.
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (Marcello)</author>
		<pubDate>Sun, 08 May 2005 21:48:26 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><div class="quote_container"><div class="title">Quote:</div><div class="quote"><p>right, point being, don&#39;t use it if you want 64bits.</p></div></div><p>
Don&#39;t use what?</p><p>Thinking about it a little harder, if the pure C fixmul is casting to floats, then it is loosing the low 8bits of each fixed number before multiplying anyway. Therefore the following should be just as good:</p><p>(x &gt;&gt; 8)*(y &gt;&gt; 8)</p><p>Conversely, if it is casting to doubles then it is no wonder that speed isn&#39;t great!<br />EDIT:
</p><div class="quote_container"><div class="title">Quote:</div><div class="quote"><p>I remember reading that back in the 68000 days Macs had a div and mult opcode; powerpc doesn&#39;t? Mips risc does...</p></div></div><p>
Why do you suppose PowerPC has no multiply or divide operations? I can see no reason you would conclude that from the thread.</p><p>In fact the PowerPC has a 32x32-&gt;64 multiply capacity even in 32bit mode, through the issuing of two consecutive instructions. Otherwise you only get a more normal 32x32-&gt;32 multiply, what with RISC machines being at least partly designed for RISC usage.</p><p>I&#39;ll see if I can figure out the correct ASM and so on for PowerPC but it is at least slightly beside the point if the pure-C can still be improved.
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (Thomas Harte)</author>
		<pubDate>Sun, 08 May 2005 22:03:17 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><p>Oops;  &#39;long long&#39; is 64 bits or is it &#39;long int&#39;?</p><p>edit; I assumed that if multiply and divide opcodes were available then they would&#39;ve been used in allegro; guess not...
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (nonnus29)</author>
		<pubDate>Sun, 08 May 2005 22:09:53 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><p>Depends on your compiler and processor. You may not even have a 64bit integral type.</p><p>EDIT:
</p><div class="quote_container"><div class="title">Quote:</div><div class="quote"><p>edit; I assumed that if multiply and divide opcodes were available then they would&#39;ve been used in allegro; guess not... </p></div></div><p>
Oh, no, the Mac OS X port just isn&#39;t complete yet in terms of things it could be doing to avoid the Allegro C fallbacks but isn&#39;t.
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (Thomas Harte)</author>
		<pubDate>Sun, 08 May 2005 22:11:23 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><div class="quote_container"><div class="title">Quote:</div><div class="quote"><p>In fact the PowerPC has a 32x32-&gt;64 multiply capacity even in 32bit mode</p></div></div><p>
So do the i386+ processors, if I recall correctly (they set the lower word in eax and the high word in edx, I think).</p><div class="quote_container"><div class="title">Quote:</div><div class="quote"><p>Oops; &#39;long long&#39; is 64 bits or is it &#39;long int&#39;?</p></div></div><p>
long long <i>may</i> be 64 bit if it is defined at all (prior to C99, it was a GNU extension). Allegro defines a (u)int64_t on non-C99 compilers, but it&#39;s just a guess and probably unreliable on non-C99 compilers.</p><div class="quote_container"><div class="title">Quote:</div><div class="quote"><p>I assumed that if multiply and divide opcodes were available then they would&#39;ve been used in allegro</p></div></div><p>
Allegro only uses assembler code on the ix86, which is actually suboptimal on recent processors. There&#39;s some minor assembler code of AMD64, but it&#39;s trivial and mostly unimportant (it retrieves the CPUID flags).
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (Evert)</author>
		<pubDate>Sun, 08 May 2005 22:22:02 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><p>These work with mingw 3.2;</p><div class="source-code"><div class="toolbar"></div><div class="inner"><table width="100%"><tbody><tr><td class="number">1</td><td><span class="k1">int</span> mul16_16<span class="k2">(</span><span class="k1">int</span> a, <span class="k1">int</span> b<span class="k2">)</span> <span class="k2">{</span></td></tr><tr><td class="number">2</td><td>    <span class="k1">long</span> <span class="k1">long</span> <span class="k1">int</span> t <span class="k3">=</span> <span class="k2">(</span><span class="k2">(</span><span class="k1">long</span> <span class="k1">long</span> <span class="k1">int</span><span class="k2">)</span>a <span class="k3">*</span> <span class="k2">(</span><span class="k1">long</span> <span class="k1">long</span> <span class="k1">int</span><span class="k2">)</span>b<span class="k2">)</span> <span class="k3">&gt;</span><span class="k3">&gt;</span> <span class="n">16</span><span class="k2">;</span></td></tr><tr><td class="number">3</td><td>    <span class="k1">return</span> <span class="k2">(</span><span class="k1">int</span><span class="k2">)</span>t<span class="k2">;</span></td></tr><tr><td class="number">4</td><td><span class="k2">}</span></td></tr><tr><td class="number">5</td><td>&#160;</td></tr><tr><td class="number">6</td><td><span class="k1">int</span> div16_16<span class="k2">(</span><span class="k1">int</span> a, <span class="k1">int</span> b<span class="k2">)</span> <span class="k2">{</span></td></tr><tr><td class="number">7</td><td>    <span class="k1">long</span> <span class="k1">long</span> <span class="k1">int</span> t <span class="k3">=</span> <span class="k2">(</span><span class="k1">long</span> <span class="k1">long</span> <span class="k1">int</span><span class="k2">)</span>a <span class="k3">&lt;</span><span class="k3">&lt;</span> <span class="n">32</span><span class="k2">;</span></td></tr><tr><td class="number">8</td><td>    <span class="k1">return</span> <span class="k2">(</span><span class="k1">int</span><span class="k2">)</span><span class="k2">(</span><span class="k2">(</span>t<span class="k3">/</span>b<span class="k2">)</span><span class="k3">&gt;</span><span class="k3">&gt;</span><span class="n">16</span><span class="k2">)</span><span class="k2">;</span></td></tr><tr><td class="number">9</td><td><span class="k2">}</span></td></tr><tr><td class="number">10</td><td>&#160;</td></tr><tr><td class="number">11</td><td><span class="k1">void</span> print16_16<span class="k2">(</span><span class="k1">int</span> a<span class="k2">)</span> <span class="k2">{</span></td></tr><tr><td class="number">12</td><td>  <span class="k1">int</span> t <span class="k3">=</span> a <span class="k3">&gt;</span><span class="k3">&gt;</span> <span class="n">16</span><span class="k2">;</span></td></tr><tr><td class="number">13</td><td>  <a href="http://www.delorie.com/djgpp/doc/libc/libc_624.html" target="_blank">printf</a><span class="k2">(</span><span class="s">"\n%i."</span>,t<span class="k2">)</span><span class="k2">;</span></td></tr><tr><td class="number">14</td><td>  t <span class="k3">=</span> a <span class="k3">&amp;</span> <span class="n">0x0000FFFF</span><span class="k2">;</span></td></tr><tr><td class="number">15</td><td>  <a href="http://www.delorie.com/djgpp/doc/libc/libc_624.html" target="_blank">printf</a><span class="k2">(</span><span class="s">"%i"</span>,t<span class="k2">)</span><span class="k2">;</span></td></tr><tr><td class="number">16</td><td><span class="k2">}</span></td></tr></tbody></table></div></div><p>
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (nonnus29)</author>
		<pubDate>Sun, 08 May 2005 23:45:03 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><div class="quote_container"><div class="title">Quote:</div><div class="quote"><p>So do the i386+ processors, if I recall correctly (they set the lower word in eax and the high word in edx, I think).</p></div></div><p>
Yes, all I meant was &#39;even the RISC PowerPC&#39; rather than &#39;unlike the yucky intel&#39;. I&#39;m not surprised at all that the CISC chip does.
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (Thomas Harte)</author>
		<pubDate>Mon, 09 May 2005 23:41:56 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><p>I added some results for Evert&#39;s latest patch with uses LONG_LONG where possible.  See the table above.
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (Peter Wang)</author>
		<pubDate>Tue, 17 May 2005 05:05:46 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><p>n/m I&#39;m smoking crack or something.</p><p>It&#39;s a little faster it seems.
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (nonnus29)</author>
		<pubDate>Tue, 17 May 2005 07:49:40 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><p>Interesting...<br />I used the following programme to time the speed of the different methods (except the asm version, which I can&#39;t check easily and am not really interested in anyway <img src="http://www.allegro.cc/forums/smileys/wink.gif" alt=";)" />) on two machines, my 32-bit Xeon workstation (running RedHat Enterprise) and my AMD64 computer at home (running Gentoo linux).</p><div class="source-code"><div class="toolbar"></div><div class="inner"><table width="100%"><tbody><tr><td class="number">1</td><td><span class="p">#include &lt;stdio.h&gt;</span></td></tr><tr><td class="number">2</td><td><span class="p">#include &lt;time.h&gt;</span></td></tr><tr><td class="number">3</td><td><span class="p">#include &lt;sys/time.h&gt;</span></td></tr><tr><td class="number">4</td><td><span class="p">#include &lt;sys/resource.h&gt;</span></td></tr><tr><td class="number">5</td><td><span class="p">#include &lt;sys/types.h&gt;</span></td></tr><tr><td class="number">6</td><td>&#160;</td></tr><tr><td class="number">7</td><td><span class="p">#define CPUTIME (getrusage(RUSAGE_SELF,&amp;ruse),\</span></td></tr><tr><td class="number">8</td><td><span class="p">  ruse.ru_utime.tv_sec + ruse.ru_stime.tv_sec + \ </span></td></tr><tr><td class="number">9</td><td><span class="p">  1e-6 * (ruse.ru_utime.tv_usec + ruse.ru_stime.tv_usec)) </span></td></tr><tr><td class="number">10</td><td>&#160;</td></tr><tr><td class="number">11</td><td><span class="k1">struct</span> rusage ruse<span class="k2">;</span></td></tr><tr><td class="number">12</td><td>&#160;</td></tr><tr><td class="number">13</td><td><span class="k1">extern</span> <span class="k1">int</span> <a href="http://www.delorie.com/djgpp/doc/libc/libc_416.html" target="_blank">getrusage</a><span class="k2">(</span><span class="k2">)</span><span class="k2">;</span></td></tr><tr><td class="number">14</td><td>&#160;</td></tr><tr><td class="number">15</td><td><span class="k1">typedef</span> <span class="k1">int</span> <a href="http://www.allegro.cc/manual/fixed" target="_blank"><span class="a">fixed</span></a><span class="k2">;</span></td></tr><tr><td class="number">16</td><td>&#160;</td></tr><tr><td class="number">17</td><td><span class="p">#define LOOP_COUNT   100000000</span></td></tr><tr><td class="number">18</td><td>&#160;</td></tr><tr><td class="number">19</td><td><span class="c">/* ftofix and fixtof are used in generic C versions of fixmul and fixdiv */</span></td></tr><tr><td class="number">20</td><td><span class="k1">inline</span> <a href="http://www.allegro.cc/manual/fixed" target="_blank"><span class="a">fixed</span></a> <a href="http://www.allegro.cc/manual/ftofix" target="_blank"><span class="a">ftofix</span></a><span class="k2">(</span><span class="k1">double</span> x<span class="k2">)</span></td></tr><tr><td class="number">21</td><td><span class="k2">{</span></td></tr><tr><td class="number">22</td><td>   <span class="k1">if</span> <span class="k2">(</span>x <span class="k3">&gt;</span> <span class="n">32767</span>.<span class="n">0</span><span class="k2">)</span> <span class="k2">{</span></td></tr><tr><td class="number">23</td><td>      <span class="k1">return</span> <span class="n">0x7FFFFFFF</span><span class="k2">;</span></td></tr><tr><td class="number">24</td><td>   <span class="k2">}</span></td></tr><tr><td class="number">25</td><td>&#160;</td></tr><tr><td class="number">26</td><td>   <span class="k1">if</span> <span class="k2">(</span>x <span class="k3">&lt;</span> <span class="k3">-</span><span class="n">32767</span>.<span class="n">0</span><span class="k2">)</span> <span class="k2">{</span></td></tr><tr><td class="number">27</td><td>      <span class="k1">return</span> <span class="k3">-</span><span class="n">0x7FFFFFFF</span><span class="k2">;</span></td></tr><tr><td class="number">28</td><td>   <span class="k2">}</span></td></tr><tr><td class="number">29</td><td>&#160;</td></tr><tr><td class="number">30</td><td>   <span class="k1">return</span> <span class="k2">(</span><span class="k1">int</span><span class="k2">)</span><span class="k2">(</span>x <span class="k3">*</span> <span class="n">65536</span>.<span class="n">0</span> <span class="k3">+</span> <span class="k2">(</span>x <span class="k3">&lt;</span> <span class="n">0</span> ? <span class="k3">-</span><span class="n">0</span>.<span class="n">5</span> <span class="k2">:</span> <span class="n">0</span>.<span class="n">5</span><span class="k2">)</span><span class="k2">)</span><span class="k2">;</span></td></tr><tr><td class="number">31</td><td><span class="k2">}</span></td></tr><tr><td class="number">32</td><td>&#160;</td></tr><tr><td class="number">33</td><td>&#160;</td></tr><tr><td class="number">34</td><td><span class="k1">inline</span> <span class="k1">double</span> <a href="http://www.allegro.cc/manual/fixtof" target="_blank"><span class="a">fixtof</span></a><span class="k2">(</span><a href="http://www.allegro.cc/manual/fixed" target="_blank"><span class="a">fixed</span></a> x<span class="k2">)</span></td></tr><tr><td class="number">35</td><td><span class="k2">{</span></td></tr><tr><td class="number">36</td><td>   <span class="k1">return</span> <span class="k2">(</span><span class="k1">double</span><span class="k2">)</span>x <span class="k3">/</span> <span class="n">65536</span>.<span class="n">0</span><span class="k2">;</span></td></tr><tr><td class="number">37</td><td><span class="k2">}</span></td></tr><tr><td class="number">38</td><td>&#160;</td></tr><tr><td class="number">39</td><td><span class="k1">inline</span> <a href="http://www.allegro.cc/manual/fixed" target="_blank"><span class="a">fixed</span></a> fixmulf<span class="k2">(</span><a href="http://www.allegro.cc/manual/fixed" target="_blank"><span class="a">fixed</span></a> x, <a href="http://www.allegro.cc/manual/fixed" target="_blank"><span class="a">fixed</span></a> y<span class="k2">)</span></td></tr><tr><td class="number">40</td><td><span class="k2">{</span></td></tr><tr><td class="number">41</td><td>   <span class="k1">return</span> <a href="http://www.allegro.cc/manual/ftofix" target="_blank"><span class="a">ftofix</span></a><span class="k2">(</span><a href="http://www.allegro.cc/manual/fixtof" target="_blank"><span class="a">fixtof</span></a><span class="k2">(</span>x<span class="k2">)</span> <span class="k3">*</span> <a href="http://www.allegro.cc/manual/fixtof" target="_blank"><span class="a">fixtof</span></a><span class="k2">(</span>y<span class="k2">)</span><span class="k2">)</span><span class="k2">;</span></td></tr><tr><td class="number">42</td><td><span class="k2">}</span></td></tr><tr><td class="number">43</td><td>&#160;</td></tr><tr><td class="number">44</td><td><span class="k1">inline</span> <a href="http://www.allegro.cc/manual/fixed" target="_blank"><span class="a">fixed</span></a> fixmull<span class="k2">(</span><a href="http://www.allegro.cc/manual/fixed" target="_blank"><span class="a">fixed</span></a> x, <a href="http://www.allegro.cc/manual/fixed" target="_blank"><span class="a">fixed</span></a> y<span class="k2">)</span></td></tr><tr><td class="number">45</td><td><span class="k2">{</span></td></tr><tr><td class="number">46</td><td>   <span class="k1">long</span> <span class="k1">long</span> lx <span class="k3">=</span> x<span class="k2">;</span></td></tr><tr><td class="number">47</td><td>   <span class="k1">long</span> <span class="k1">long</span> ly <span class="k3">=</span> y<span class="k2">;</span></td></tr><tr><td class="number">48</td><td>   <span class="k1">long</span> <span class="k1">long</span> lres <span class="k3">=</span> <span class="k2">(</span>lx<span class="k3">*</span>ly<span class="k2">)</span><span class="k3">&gt;</span><span class="k3">&gt;</span><span class="n">16</span><span class="k2">;</span></td></tr><tr><td class="number">49</td><td>   <span class="k1">int</span> res <span class="k3">=</span> lres<span class="k2">;</span></td></tr><tr><td class="number">50</td><td>   <span class="k1">return</span> res<span class="k2">;</span></td></tr><tr><td class="number">51</td><td><span class="k2">}</span></td></tr><tr><td class="number">52</td><td>&#160;</td></tr><tr><td class="number">53</td><td><span class="k1">inline</span> <a href="http://www.allegro.cc/manual/fixed" target="_blank"><span class="a">fixed</span></a> fixmuli<span class="k2">(</span><a href="http://www.allegro.cc/manual/fixed" target="_blank"><span class="a">fixed</span></a> x, <a href="http://www.allegro.cc/manual/fixed" target="_blank"><span class="a">fixed</span></a> y<span class="k2">)</span></td></tr><tr><td class="number">54</td><td><span class="k2">{</span></td></tr><tr><td class="number">55</td><td>   <a href="http://www.allegro.cc/manual/fixed" target="_blank"><span class="a">fixed</span></a> sign <span class="k3">=</span> <span class="k2">(</span>x^y<span class="k2">)</span> <span class="k3">&amp;</span> <span class="n">0x80000000</span><span class="k2">;</span></td></tr><tr><td class="number">56</td><td>   <span class="k1">int</span> mask_x <span class="k3">=</span> x <span class="k3">&gt;</span><span class="k3">&gt;</span> <span class="n">31</span><span class="k2">;</span></td></tr><tr><td class="number">57</td><td>   <span class="k1">int</span> mask_y <span class="k3">=</span> y <span class="k3">&gt;</span><span class="k3">&gt;</span> <span class="n">31</span><span class="k2">;</span></td></tr><tr><td class="number">58</td><td>   <span class="k1">int</span> mask_result <span class="k3">=</span> sign <span class="k3">&gt;</span><span class="k3">&gt;</span> <span class="n">31</span><span class="k2">;</span></td></tr><tr><td class="number">59</td><td>   <a href="http://www.allegro.cc/manual/fixed" target="_blank"><span class="a">fixed</span></a> result<span class="k2">;</span></td></tr><tr><td class="number">60</td><td>&#160;</td></tr><tr><td class="number">61</td><td>   x <span class="k3">=</span> <span class="k2">(</span>x^mask_x<span class="k2">)</span> <span class="k3">-</span> mask_x<span class="k2">;</span></td></tr><tr><td class="number">62</td><td>   y <span class="k3">=</span> <span class="k2">(</span>y^mask_y<span class="k2">)</span> <span class="k3">-</span> mask_y<span class="k2">;</span></td></tr><tr><td class="number">63</td><td>&#160;</td></tr><tr><td class="number">64</td><td>   result <span class="k3">=</span> <span class="k2">(</span><span class="k2">(</span>y <span class="k3">&gt;</span><span class="k3">&gt;</span> <span class="n">8</span><span class="k2">)</span><span class="k3">*</span><span class="k2">(</span>x <span class="k3">&gt;</span><span class="k3">&gt;</span> <span class="n">8</span><span class="k2">)</span> <span class="k3">+</span></td></tr><tr><td class="number">65</td><td>             <span class="k2">(</span><span class="k2">(</span><span class="k2">(</span>y <span class="k3">&gt;</span><span class="k3">&gt;</span> <span class="n">8</span><span class="k2">)</span><span class="k3">*</span><span class="k2">(</span>x<span class="k3">&amp;</span><span class="n">0xff</span><span class="k2">)</span><span class="k2">)</span> <span class="k3">&gt;</span><span class="k3">&gt;</span> <span class="n">8</span><span class="k2">)</span> <span class="k3">+</span></td></tr><tr><td class="number">66</td><td>             <span class="k2">(</span><span class="k2">(</span><span class="k2">(</span>x <span class="k3">&gt;</span><span class="k3">&gt;</span> <span class="n">8</span><span class="k2">)</span><span class="k3">*</span><span class="k2">(</span>y<span class="k3">&amp;</span><span class="n">0xff</span><span class="k2">)</span><span class="k2">)</span> <span class="k3">&gt;</span><span class="k3">&gt;</span> <span class="n">8</span><span class="k2">)</span><span class="k2">)</span><span class="k2">;</span></td></tr><tr><td class="number">67</td><td>&#160;</td></tr><tr><td class="number">68</td><td>   <span class="k1">return</span> <span class="k2">(</span>result^mask_result<span class="k2">)</span> <span class="k3">-</span> mask_result<span class="k2">;</span></td></tr><tr><td class="number">69</td><td><span class="k2">}</span></td></tr><tr><td class="number">70</td><td>&#160;</td></tr><tr><td class="number">71</td><td><span class="k1">inline</span> <a href="http://www.allegro.cc/manual/fixed" target="_blank"><span class="a">fixed</span></a> fixdivf<span class="k2">(</span><a href="http://www.allegro.cc/manual/fixed" target="_blank"><span class="a">fixed</span></a> x, <a href="http://www.allegro.cc/manual/fixed" target="_blank"><span class="a">fixed</span></a> y<span class="k2">)</span></td></tr><tr><td class="number">72</td><td><span class="k2">{</span></td></tr><tr><td class="number">73</td><td>   <span class="k1">if</span> <span class="k2">(</span>y <span class="k3">=</span><span class="k3">=</span> <span class="n">0</span><span class="k2">)</span> <span class="k2">{</span></td></tr><tr><td class="number">74</td><td>      <span class="k1">return</span> <span class="k2">(</span>x <span class="k3">&lt;</span> <span class="n">0</span><span class="k2">)</span> ? <span class="k3">-</span><span class="n">0x7FFFFFFF</span> <span class="k2">:</span> <span class="n">0x7FFFFFFF</span><span class="k2">;</span></td></tr><tr><td class="number">75</td><td>   <span class="k2">}</span></td></tr><tr><td class="number">76</td><td>   <span class="k1">else</span></td></tr><tr><td class="number">77</td><td>      <span class="k1">return</span> <a href="http://www.allegro.cc/manual/ftofix" target="_blank"><span class="a">ftofix</span></a><span class="k2">(</span><a href="http://www.allegro.cc/manual/fixtof" target="_blank"><span class="a">fixtof</span></a><span class="k2">(</span>x<span class="k2">)</span> <span class="k3">/</span> <a href="http://www.allegro.cc/manual/fixtof" target="_blank"><span class="a">fixtof</span></a><span class="k2">(</span>y<span class="k2">)</span><span class="k2">)</span><span class="k2">;</span></td></tr><tr><td class="number">78</td><td><span class="k2">}</span></td></tr><tr><td class="number">79</td><td>&#160;</td></tr><tr><td class="number">80</td><td><span class="k1">inline</span> <a href="http://www.allegro.cc/manual/fixed" target="_blank"><span class="a">fixed</span></a> fixdivl<span class="k2">(</span><a href="http://www.allegro.cc/manual/fixed" target="_blank"><span class="a">fixed</span></a> x, <a href="http://www.allegro.cc/manual/fixed" target="_blank"><span class="a">fixed</span></a> y<span class="k2">)</span></td></tr><tr><td class="number">81</td><td><span class="k2">{</span></td></tr><tr><td class="number">82</td><td>   <span class="k1">if</span> <span class="k2">(</span>y <span class="k3">=</span><span class="k3">=</span> <span class="n">0</span><span class="k2">)</span> <span class="k2">{</span></td></tr><tr><td class="number">83</td><td>      <span class="k1">return</span> <span class="k2">(</span>x <span class="k3">&lt;</span> <span class="n">0</span><span class="k2">)</span> ? <span class="k3">-</span><span class="n">0x7FFFFFFF</span> <span class="k2">:</span> <span class="n">0x7FFFFFFF</span><span class="k2">;</span></td></tr><tr><td class="number">84</td><td>   <span class="k2">}</span></td></tr><tr><td class="number">85</td><td>   <span class="k1">else</span> <span class="k2">{</span></td></tr><tr><td class="number">86</td><td>      <span class="k1">long</span> <span class="k1">long</span> lx <span class="k3">=</span> x<span class="k2">;</span></td></tr><tr><td class="number">87</td><td>      <span class="k1">long</span> <span class="k1">long</span> ly <span class="k3">=</span> y<span class="k2">;</span></td></tr><tr><td class="number">88</td><td>      <span class="k1">long</span> <span class="k1">long</span> lres <span class="k3">=</span> <span class="k2">(</span>lx <span class="k3">&lt;</span><span class="k3">&lt;</span> <span class="n">16</span><span class="k2">)</span> <span class="k3">/</span> ly<span class="k2">;</span></td></tr><tr><td class="number">89</td><td>      <span class="k1">int</span> res <span class="k3">=</span> lres<span class="k2">;</span></td></tr><tr><td class="number">90</td><td>      <span class="k1">return</span> res<span class="k2">;</span></td></tr><tr><td class="number">91</td><td>   <span class="k2">}</span></td></tr><tr><td class="number">92</td><td><span class="k2">}</span></td></tr><tr><td class="number">93</td><td>&#160;</td></tr><tr><td class="number">94</td><td><span class="k1">int</span> main<span class="k2">(</span><span class="k1">void</span><span class="k2">)</span></td></tr><tr><td class="number">95</td><td><span class="k2">{</span></td></tr><tr><td class="number">96</td><td>   <span class="k1">double</span> t0, t1<span class="k2">;</span></td></tr><tr><td class="number">97</td><td>   time_t u1,u2<span class="k2">;</span></td></tr><tr><td class="number">98</td><td>   <a href="http://www.allegro.cc/manual/fixed" target="_blank"><span class="a">fixed</span></a> x, y, z<span class="k2">;</span></td></tr><tr><td class="number">99</td><td>   <span class="k1">int</span> c<span class="k2">;</span></td></tr><tr><td class="number">100</td><td>   </td></tr><tr><td class="number">101</td><td>   <a href="http://www.delorie.com/djgpp/doc/libc/libc_624.html" target="_blank">printf</a><span class="k2">(</span><span class="s">"sizeof(long long) = %d, sizeof(fixed) = %d\n"</span>, <span class="k1">sizeof</span><span class="k2">(</span><span class="k1">long</span> <span class="k1">long</span><span class="k2">)</span>, <span class="k1">sizeof</span><span class="k2">(</span><a href="http://www.allegro.cc/manual/fixed" target="_blank"><span class="a">fixed</span></a><span class="k2">)</span><span class="k2">)</span><span class="k2">;</span></td></tr><tr><td class="number">102</td><td>&#160;</td></tr><tr><td class="number">103</td><td>   <a href="http://www.delorie.com/djgpp/doc/libc/libc_624.html" target="_blank">printf</a><span class="k2">(</span><span class="s">"Timing %d calls to fixmulf..."</span>, LOOP_COUNT<span class="k2">)</span><span class="k2">;</span></td></tr><tr><td class="number">104</td><td>   <a href="http://www.delorie.com/djgpp/doc/libc/libc_315.html" target="_blank">fflush</a><span class="k2">(</span>stdout<span class="k2">)</span><span class="k2">;</span></td></tr><tr><td class="number">105</td><td>   t0 <span class="k3">=</span> CPUTIME<span class="k2">;</span></td></tr><tr><td class="number">106</td><td>   <span class="c">/* place code to be timed here */</span></td></tr><tr><td class="number">107</td><td>   <span class="k1">for</span> <span class="k2">(</span>c<span class="k3">=</span><span class="n">0</span><span class="k2">;</span> c<span class="k3">&lt;</span>LOOP_COUNT<span class="k2">;</span> c<span class="k3">+</span><span class="k3">+</span><span class="k2">)</span> <span class="k2">{</span></td></tr><tr><td class="number">108</td><td>      z <span class="k3">=</span> fixmulf<span class="k2">(</span>x,y<span class="k2">)</span><span class="k2">;</span></td></tr><tr><td class="number">109</td><td>      x <span class="k3">+</span><span class="k3">=</span> <span class="n">1317</span><span class="k2">;</span></td></tr><tr><td class="number">110</td><td>      y <span class="k3">+</span><span class="k3">=</span> <span class="n">7143</span><span class="k2">;</span></td></tr><tr><td class="number">111</td><td>   <span class="k2">}</span></td></tr><tr><td class="number">112</td><td>   t1 <span class="k3">=</span> CPUTIME<span class="k2">;</span></td></tr><tr><td class="number">113</td><td>   <a href="http://www.delorie.com/djgpp/doc/libc/libc_624.html" target="_blank">printf</a><span class="k2">(</span><span class="s">"CPU time = %f secs.\n"</span>,t1-t0<span class="k2">)</span><span class="k2">;</span></td></tr><tr><td class="number">114</td><td>&#160;</td></tr><tr><td class="number">115</td><td>   <a href="http://www.delorie.com/djgpp/doc/libc/libc_624.html" target="_blank">printf</a><span class="k2">(</span><span class="s">"Timing %d calls to fixmuli..."</span>, LOOP_COUNT<span class="k2">)</span><span class="k2">;</span></td></tr><tr><td class="number">116</td><td>   <a href="http://www.delorie.com/djgpp/doc/libc/libc_315.html" target="_blank">fflush</a><span class="k2">(</span>stdout<span class="k2">)</span><span class="k2">;</span></td></tr><tr><td class="number">117</td><td>   t0 <span class="k3">=</span> CPUTIME<span class="k2">;</span></td></tr><tr><td class="number">118</td><td>   <span class="c">/* place code to be timed here */</span></td></tr><tr><td class="number">119</td><td>   <span class="k1">for</span> <span class="k2">(</span>c<span class="k3">=</span><span class="n">0</span><span class="k2">;</span> c<span class="k3">&lt;</span>LOOP_COUNT<span class="k2">;</span> c<span class="k3">+</span><span class="k3">+</span><span class="k2">)</span> <span class="k2">{</span></td></tr><tr><td class="number">120</td><td>      z <span class="k3">=</span> fixmuli<span class="k2">(</span>x,y<span class="k2">)</span><span class="k2">;</span></td></tr><tr><td class="number">121</td><td>      x <span class="k3">+</span><span class="k3">=</span> <span class="n">1317</span><span class="k2">;</span></td></tr><tr><td class="number">122</td><td>      y <span class="k3">+</span><span class="k3">=</span> <span class="n">7143</span><span class="k2">;</span></td></tr><tr><td class="number">123</td><td>   <span class="k2">}</span></td></tr><tr><td class="number">124</td><td>   t1 <span class="k3">=</span> CPUTIME<span class="k2">;</span></td></tr><tr><td class="number">125</td><td>   <a href="http://www.delorie.com/djgpp/doc/libc/libc_624.html" target="_blank">printf</a><span class="k2">(</span><span class="s">"CPU time = %f secs.\n"</span>,t1-t0<span class="k2">)</span><span class="k2">;</span></td></tr><tr><td class="number">126</td><td>&#160;</td></tr><tr><td class="number">127</td><td>   <a href="http://www.delorie.com/djgpp/doc/libc/libc_624.html" target="_blank">printf</a><span class="k2">(</span><span class="s">"Timing %d calls to fixmull..."</span>, LOOP_COUNT<span class="k2">)</span><span class="k2">;</span></td></tr><tr><td class="number">128</td><td>   <a href="http://www.delorie.com/djgpp/doc/libc/libc_315.html" target="_blank">fflush</a><span class="k2">(</span>stdout<span class="k2">)</span><span class="k2">;</span></td></tr><tr><td class="number">129</td><td>   t0 <span class="k3">=</span> CPUTIME<span class="k2">;</span></td></tr><tr><td class="number">130</td><td>   <span class="c">/* place code to be timed here */</span></td></tr><tr><td class="number">131</td><td>   <span class="k1">for</span> <span class="k2">(</span>c<span class="k3">=</span><span class="n">0</span><span class="k2">;</span> c<span class="k3">&lt;</span>LOOP_COUNT<span class="k2">;</span> c<span class="k3">+</span><span class="k3">+</span><span class="k2">)</span> <span class="k2">{</span></td></tr><tr><td class="number">132</td><td>      z <span class="k3">=</span> fixmull<span class="k2">(</span>x,y<span class="k2">)</span><span class="k2">;</span></td></tr><tr><td class="number">133</td><td>      x <span class="k3">+</span><span class="k3">=</span> <span class="n">1317</span><span class="k2">;</span></td></tr><tr><td class="number">134</td><td>      y <span class="k3">+</span><span class="k3">=</span> <span class="n">7143</span><span class="k2">;</span></td></tr><tr><td class="number">135</td><td>   <span class="k2">}</span></td></tr><tr><td class="number">136</td><td>   t1 <span class="k3">=</span> CPUTIME<span class="k2">;</span></td></tr><tr><td class="number">137</td><td>   <a href="http://www.delorie.com/djgpp/doc/libc/libc_624.html" target="_blank">printf</a><span class="k2">(</span><span class="s">"CPU time = %f secs.\n"</span>,t1-t0<span class="k2">)</span><span class="k2">;</span></td></tr><tr><td class="number">138</td><td>&#160;</td></tr><tr><td class="number">139</td><td>   <a href="http://www.delorie.com/djgpp/doc/libc/libc_624.html" target="_blank">printf</a><span class="k2">(</span><span class="s">"Timing %d calls to fixdivf..."</span>, LOOP_COUNT<span class="k2">)</span><span class="k2">;</span></td></tr><tr><td class="number">140</td><td>   <a href="http://www.delorie.com/djgpp/doc/libc/libc_315.html" target="_blank">fflush</a><span class="k2">(</span>stdout<span class="k2">)</span><span class="k2">;</span></td></tr><tr><td class="number">141</td><td>   t0 <span class="k3">=</span> CPUTIME<span class="k2">;</span></td></tr><tr><td class="number">142</td><td>   <span class="c">/* place code to be timed here */</span></td></tr><tr><td class="number">143</td><td>   <span class="k1">for</span> <span class="k2">(</span>c<span class="k3">=</span><span class="n">0</span><span class="k2">;</span> c<span class="k3">&lt;</span>LOOP_COUNT<span class="k2">;</span> c<span class="k3">+</span><span class="k3">+</span><span class="k2">)</span> <span class="k2">{</span></td></tr><tr><td class="number">144</td><td>      z <span class="k3">=</span> fixdivf<span class="k2">(</span>x,y<span class="k2">)</span><span class="k2">;</span></td></tr><tr><td class="number">145</td><td>      x <span class="k3">+</span><span class="k3">=</span> <span class="n">1317</span><span class="k2">;</span></td></tr><tr><td class="number">146</td><td>      y <span class="k3">+</span><span class="k3">=</span> <span class="n">7143</span><span class="k2">;</span></td></tr><tr><td class="number">147</td><td>   <span class="k2">}</span></td></tr><tr><td class="number">148</td><td>   t1 <span class="k3">=</span> CPUTIME<span class="k2">;</span></td></tr><tr><td class="number">149</td><td>   <a href="http://www.delorie.com/djgpp/doc/libc/libc_624.html" target="_blank">printf</a><span class="k2">(</span><span class="s">"CPU time = %f secs.\n"</span>,t1-t0<span class="k2">)</span><span class="k2">;</span></td></tr><tr><td class="number">150</td><td>&#160;</td></tr><tr><td class="number">151</td><td>   <a href="http://www.delorie.com/djgpp/doc/libc/libc_624.html" target="_blank">printf</a><span class="k2">(</span><span class="s">"Timing %d calls to fixdivl..."</span>, LOOP_COUNT<span class="k2">)</span><span class="k2">;</span></td></tr><tr><td class="number">152</td><td>   <a href="http://www.delorie.com/djgpp/doc/libc/libc_315.html" target="_blank">fflush</a><span class="k2">(</span>stdout<span class="k2">)</span><span class="k2">;</span></td></tr><tr><td class="number">153</td><td>   t0 <span class="k3">=</span> CPUTIME<span class="k2">;</span></td></tr><tr><td class="number">154</td><td>   <span class="c">/* place code to be timed here */</span></td></tr><tr><td class="number">155</td><td>   <span class="k1">for</span> <span class="k2">(</span>c<span class="k3">=</span><span class="n">0</span><span class="k2">;</span> c<span class="k3">&lt;</span>LOOP_COUNT<span class="k2">;</span> c<span class="k3">+</span><span class="k3">+</span><span class="k2">)</span> <span class="k2">{</span></td></tr><tr><td class="number">156</td><td>      z <span class="k3">=</span> fixdivl<span class="k2">(</span>x,y<span class="k2">)</span><span class="k2">;</span></td></tr><tr><td class="number">157</td><td>      x <span class="k3">+</span><span class="k3">=</span> <span class="n">1317</span><span class="k2">;</span></td></tr><tr><td class="number">158</td><td>      y <span class="k3">+</span><span class="k3">=</span> <span class="n">7143</span><span class="k2">;</span></td></tr><tr><td class="number">159</td><td>   <span class="k2">}</span></td></tr><tr><td class="number">160</td><td>   t1 <span class="k3">=</span> CPUTIME<span class="k2">;</span></td></tr><tr><td class="number">161</td><td>   <a href="http://www.delorie.com/djgpp/doc/libc/libc_624.html" target="_blank">printf</a><span class="k2">(</span><span class="s">"CPU time = %f secs.\n"</span>,t1-t0<span class="k2">)</span><span class="k2">;</span></td></tr><tr><td class="number">162</td><td>&#160;</td></tr><tr><td class="number">163</td><td>   <span class="k1">return</span> <span class="n">0</span><span class="k2">;</span></td></tr><tr><td class="number">164</td><td><span class="k2">}</span></td></tr></tbody></table></div></div><p>

On the Xeon, I got the following results:
</p><div class="quote_container"><div class="title">Xeon results said:</div><div class="quote"><p>
sizeof(long long) = 8, sizeof(fixed) = 4<br />Timing 100000000 calls to fixmulf...CPU time = 4.568305 secs.<br />Timing 100000000 calls to fixmuli...CPU time = 2.769579 secs.<br />Timing 100000000 calls to fixmull...CPU time = 3.394484 secs.<br />Timing 100000000 calls to fixdivf...CPU time = 8.174757 secs.<br />Timing 100000000 calls to fixdivl...CPU time = 4.776274 secs.
</p></div></div><p>
I guess this is consistent with what Peter obtained. For my 64 bit machine, the result is<br />sizeof(long long) = 8, sizeof(fixed) = 4
</p><div class="quote_container"><div class="title">AMD64 said:</div><div class="quote"><p>Timing 100000000 calls to fixmulf...CPU time = 4.133372 secs.<br />Timing 100000000 calls to fixmuli...CPU time = 2.085683 secs.<br />Timing 100000000 calls to fixmull...CPU time = 1.460777 secs.<br />Timing 100000000 calls to fixdivf...CPU time = 6.344036 secs.<br />Timing 100000000 calls to fixdivl...CPU time = 5.359185 secs.</p></div></div><p>
As expected, the long long multiplication is the fastest here. I&#39;m puzzled by the relative and absolute poor performance of the long long division code though: the 32 bit machine outperforms the 64 bit machine when dividing 64 bit integers... I&#39;ll have to check out what&#39;s wrong there.</p><p>EDIT:<br />Meh, I forgot to pass -O2 for optimizations... raw results with optimzations posted below, will comment later.
</p><div class="quote_container"><div class="title">Xeon said:</div><div class="quote"><p>
sizeof(long long) = 8, sizeof(fixed) = 4<br />Timing 100000000 calls to fixmulf...CPU time = 0.054991 secs.<br />Timing 100000000 calls to fixmuli...CPU time = 0.054992 secs.<br />Timing 100000000 calls to fixmull...CPU time = 0.054991 secs.<br />Timing 100000000 calls to fixdivf...CPU time = 0.067990 secs.<br />Timing 100000000 calls to fixdivl...CPU time = 0.053992 secs.
</p></div></div><p>
</p><div class="quote_container"><div class="title">AMD64 said:</div><div class="quote"><p>
sizeof(long long) = 8, sizeof(fixed) = 4<br />Timing 100000000 calls to fixmulf...CPU time = 0.100984 secs.<br />Timing 100000000 calls to fixmuli...CPU time = 0.100985 secs.<br />Timing 100000000 calls to fixmull...CPU time = 0.103984 secs.<br />Timing 100000000 calls to fixdivf...CPU time = 0.103984 secs.<br />Timing 100000000 calls to fixdivl...CPU time = 0.101984 secs.
</p></div></div><p>
And I never want to hear anyone say that Xeon (or Intel) are no good and AMD is better. <img src="http://www.allegro.cc/forums/smileys/tongue.gif" alt=":P" /></p><p>EDIT2: Did a more detailed run and check, now with a larger number of loop iterations and three machines: the 2.8GHz Xeon-HT (actually dual Xeon-HT), the AMD64 3200 and a 950MHz Celeron. Times are reproducible with a deviation of about 1%.
</p><div class="quote_container"><div class="title">results said:</div><div class="quote"><p>
Xeon:<br />sizeof(long long) = 8, sizeof(fixed) = 4<br />Timing 2000000000 calls to fixmulf...CPU time = 1.092833 secs.<br />Timing 2000000000 calls to fixmuli...CPU time = 1.151825 secs.<br />Timing 2000000000 calls to fixmull...CPU time = 0.085987 secs.<br />Timing 2000000000 calls to fixdivf...CPU time = 0.056992 secs.<br />Timing 2000000000 calls to fixdivl...CPU time = 0.089986 secs.</p><p>Celeron:<br />sizeof(long long) = 8, sizeof(fixed) = 4<br />Timing 2000000000 calls to fixmulf...CPU time = 4.260000 secs.<br />Timing 2000000000 calls to fixmuli...CPU time = 4.230000 secs.<br />Timing 2000000000 calls to fixmull...CPU time = 2.120000 secs.<br />Timing 2000000000 calls to fixdivf...CPU time = 0.850000 secs.<br />Timing 2000000000 calls to fixdivl...CPU time = 2.140000 secs.</p><p>AMD64:<br />sizeof(long long) = 8, sizeof(fixed) = 4<br />Timing 2000000000 calls to fixmulf...CPU time = 2.024693 secs.<br />Timing 2000000000 calls to fixmuli...CPU time = 2.022692 secs.<br />Timing 2000000000 calls to fixmull...CPU time = 2.022693 secs.<br />Timing 2000000000 calls to fixdivf...CPU time = 0.201969 secs.<br />Timing 2000000000 calls to fixdivl...CPU time = 0.101984 secs.
</p></div></div><p>
The fixmull is the fastest on all machines (although it doesn&#39;t matter much at all on the AMD64). The fixdivl is the faster than fixdivf on the AMD64, but slower on the Xeon and an order of magnitude slower on the Xeon.<br />So based on this I would propose to apply the fixdivl patch regardless and only use the fixdivl patch on native 64-bit machines (does that include Macs?).<br />I am somewhat concerned that the running time doesn&#39;t scale uniformly with the number of iterations in the loop though...
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (Evert)</author>
		<pubDate>Tue, 17 May 2005 15:36:01 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><p>So conclusion is, on your high end machines: fixdivl makes some difference for you, nothing else really seems to be significant.</p><p>Results of your program on my 667Mhz G4, OS X v10.4.1 (with iTunes and firefox eating 15% CPU before anything else even gets a look-in):</p><p>sizeof(long long) = 8, sizeof(fixed) = 4<br />Timing 100000000 calls to fixmulf...CPU time = 0.459279 secs.<br />Timing 100000000 calls to fixmuli...CPU time = 0.458020 secs.<br />Timing 100000000 calls to fixmull...CPU time = 0.458326 secs.<br />Timing 100000000 calls to fixdivf...CPU time = 0.457835 secs.<br />Timing 100000000 calls to fixdivl...CPU time = 0.305127 secs.</p><p>Digging into the assembly (compiled with O2):
</p><div class="quote_container"><div class="title">fixmull said:</div><div class="quote"><p>	.align 2<br />	.globl _fixmull<br />_fixmull:<br />	mulhw r9,r3,r4<br />	mullw r10,r3,r4<br />	srwi r12,r10,16<br />	insrwi r12,r9,16,0<br />	srawi r11,r9,16<br />	mr r3,r12<br />	blr</p></div></div><p>
Looks pretty good to me.
</p><div class="quote_container"><div class="title">fixmuli said:</div><div class="quote"><p>	.align 2<br />	.globl _fixmuli<br />_fixmuli:<br />	srawi r11,r3,31<br />	srawi r10,r4,31<br />	xor r2,r3,r11<br />	xor r0,r4,r10<br />	subf r0,r10,r0<br />	subf r2,r11,r2<br />	mr r9,r3<br />	rlwinm r11,r2,0,24,31<br />	srawi r3,r0,8<br />	srawi r2,r2,8<br />	mullw r11,r3,r11<br />	rlwinm r0,r0,0,24,31<br />	xor r9,r9,r4<br />	srawi r9,r9,31<br />	mullw r0,r2,r0<br />	srawi r11,r11,8<br />	mullw r3,r3,r2<br />	srawi r0,r0,8<br />	add r3,r3,r11<br />	add r3,r3,r0<br />	xor r3,r9,r3<br />	subf r3,r9,r3<br />	blr</p></div></div><p>
Again, not awful.
</p><div class="quote_container"><div class="title">fixmulf said:</div><div class="quote"><p>	.align 2<br />	.globl _fixmulf<br />_fixmulf:<br />	mflr r0<br />	bcl 20,31,&quot;L00000000003$pb&quot;<br />&quot;L00000000003$pb&quot;:<br />	xoris r3,r3,0x8000<br />	xoris r4,r4,0x8000<br />	mflr r10<br />	stw r3,-36(r1)<br />	stw r4,-28(r1)<br />	mtlr r0<br />	lis r0,0x4330<br />	addis r2,r10,ha16(LC9-&quot;L00000000003$pb&quot;)<br />	stw r0,-32(r1)<br />	stw r0,-40(r1)<br />	lfd f11,lo16(LC9-&quot;L00000000003$pb&quot;)(r2)<br />	addis r2,r10,ha16(LC10-&quot;L00000000003$pb&quot;)<br />	lfd f13,-32(r1)<br />	lfd f0,-40(r1)<br />	fsub f13,f13,f11<br />	lfd f12,lo16(LC10-&quot;L00000000003$pb&quot;)(r2)<br />	fsub f0,f0,f11<br />	addis r2,r10,ha16(LC7-&quot;L00000000003$pb&quot;)<br />	fmul f13,f13,f12<br />	fmul f0,f0,f12<br />	fmul f10,f0,f13<br />	lfd f0,lo16(LC7-&quot;L00000000003$pb&quot;)(r2)<br />	fcmpu cr7,f10,f0<br />	bng- cr7,L28<br />	lis r3,0x7fff<br />	ori r3,r3,65535<br />	blr<br />L28:<br />	addis r2,r10,ha16(LC8-&quot;L00000000003$pb&quot;)<br />	lfd f0,lo16(LC8-&quot;L00000000003$pb&quot;)(r2)<br />	fcmpu cr7,f10,f0<br />	bnl+ cr7,L32<br />	lis r3,0x8000<br />	ori r3,r3,1<br />	blr<br />L32:<br />	addis r2,r10,ha16(LC11-&quot;L00000000003$pb&quot;)<br />	fneg f12,f10<br />	lfd f11,lo16(LC11-&quot;L00000000003$pb&quot;)(r2)<br />	addis r2,r10,ha16(LC12-&quot;L00000000003$pb&quot;)<br />	lfd f13,lo16(LC12-&quot;L00000000003$pb&quot;)(r2)<br />	addis r2,r10,ha16(LC13-&quot;L00000000003$pb&quot;)<br />	lfd f0,lo16(LC13-&quot;L00000000003$pb&quot;)(r2)<br />	fsel f13,f10,f11,f13<br />	fsel f12,f12,f13,f11<br />	fmadd f0,f10,f0,f12<br />	fctiwz f13,f0<br />	stfd f13,-24(r1)<br />	lwz r3,-20(r1)<br />	blr<br />	.literal8<br />	.align 3<br />LC14:<br />	.long	1088421824<br />	.long	0<br />	.align 3<br />LC15:<br />	.long	-1059061824<br />	.long	0<br />	.align 3<br />LC16:<br />	.long	1127219200<br />	.long	-2147483648<br />	.align 3<br />LC17:<br />	.long	1055916032<br />	.long	0<br />	.align 3<br />LC18:<br />	.long	1071644672<br />	.long	0<br />	.align 3<br />LC19:<br />	.long	-1075838976<br />	.long	0<br />	.align 3<br />LC20:<br />	.long	1089470464<br />	.long	0
</p></div></div><p>
Note especially all the stws (store word) and lfds (load floating point double).</p><p>I would guess that all of the artifical examples are getting unrealistic figures for the multiplications involving floats because they are able to overlap usage of the floating point and integer units where a real program might be able to do the same with their own calculations unrelated to the mul. This alone should make fixmuli/l preferable to fixmulf wherever their timing is not substantially worse in i/l respects.
</p><div class="quote_container"><div class="title">Quote:</div><div class="quote"><p>So based on this I would propose to apply the fixdivl patch regardless and only use the fixdivl patch on native 64-bit machines (does that include Macs?).</p></div></div><p>
The G5s (generally hanging around since 2001 I think) include support for the 64bit mode (which had been in the CPU specs since PowerPC came around in about 1993 but unimplemented in any consumer design until a few years ago) and under OS X v10.4+ it is possible to build fat binaries that include 64bit operation paths for G5s. Under 10.3 and before, 64bit is only used to provide a larger address space (unless the user asms some in themselves, of course). My G4 has no native 64bit processing except in the vector unit and that probably isn&#39;t advisable for non-vectors for all the obvious reasons.</p><p>In any case, the fixdivl is substantially faster even on 32bit CPUs for PowerPC!</p><p>EDIT: a true console build and run produces different results (the other having been dumping results from a Cocoa application to a log window due to that being the quickest way to get an XCode project going):</p><p>Timing 100000000 calls to fixmulf...CPU time = 0.305241 secs.<br />Timing 100000000 calls to fixmuli...CPU time = 0.455976 secs.<br />Timing 100000000 calls to fixmull...CPU time = 0.303116 secs.<br />Timing 100000000 calls to fixdivf...CPU time = 0.454786 secs.<br />Timing 100000000 calls to fixdivl...CPU time = 0.303217 secs.</p><p>Notably fixmuli has leapt upwards. I have no idea why this is, but the results still show very clearly that fixmull and fixdivl are the way forward so I&#39;m not really that interested.
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (Thomas Harte)</author>
		<pubDate>Tue, 17 May 2005 16:20:31 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><p>Sorry, but I think Evert&#39;s test is deeply flawed.  On my machine, gcc -O2:</p><div class="quote_container"><div class="title">Quote:</div><div class="quote"><p>
sizeof(long long) = 8, sizeof(fixed) = 4<br />Timing 200000000 calls to fixmulf...CPU time = 0.113982 secs.<br />Timing 200000000 calls to fixmuli...CPU time = 0.113983 secs.<br />Timing 200000000 calls to fixmull...CPU time = 0.018997 secs.<br />Timing 200000000 calls to fixdivf...CPU time = 0.114982 secs.<br />Timing 200000000 calls to fixdivl...CPU time = 0.019997 secs.
</p></div></div><p>

Now I comment out the call to fixdivl:</p><div class="quote_container"><div class="title">Quote:</div><div class="quote"><p>
sizeof(long long) = 8, sizeof(fixed) = 4<br />Timing 200000000 calls to fixmulf...CPU time = 0.113982 secs.<br />Timing 200000000 calls to fixmuli...CPU time = 0.113983 secs.<br />Timing 200000000 calls to fixmull...CPU time = 0.018997 secs.<br />Timing 200000000 calls to fixdivf...CPU time = 0.114982 secs.<br />Timing 200000000 calls to fixdivl...CPU time = 0.112983 secs.
</p></div></div><p>

I basically don&#39;t know assembly, but as far as I can tell we&#39;re just measuring empty loops, and for some reason the fixmull and fixdivl were compiled in such a way that they executed faster.</p><p>If I accumulate the results of the fixmul/fixdiv calls, e.g. like this: (full changed file attached)</p><div class="source-code snippet"><div class="inner"><pre>   <a href="http://www.delorie.com/djgpp/doc/libc/libc_624.html" target="_blank">printf</a><span class="k2">(</span><span class="s">"Timing %d calls to fixdivl..."</span>, LOOP_COUNT<span class="k2">)</span><span class="k2">;</span>
   <a href="http://www.delorie.com/djgpp/doc/libc/libc_315.html" target="_blank">fflush</a><span class="k2">(</span>stdout<span class="k2">)</span><span class="k2">;</span>
   zacc <span class="k3">=</span> <span class="n">0</span><span class="k2">;</span>
   t0 <span class="k3">=</span> CPUTIME<span class="k2">;</span>
   <span class="c">/* place code to be timed here */</span>
   <span class="k1">for</span> <span class="k2">(</span>c<span class="k3">=</span><span class="n">0</span><span class="k2">;</span> c<span class="k3">&lt;</span>LOOP_COUNT<span class="k2">;</span> c<span class="k3">+</span><span class="k3">+</span><span class="k2">)</span> <span class="k2">{</span>
      z <span class="k3">=</span> fixdivl<span class="k2">(</span>x,y<span class="k2">)</span><span class="k2">;</span>
      x <span class="k3">+</span><span class="k3">=</span> <span class="n">1317</span><span class="k2">;</span>
      y <span class="k3">+</span><span class="k3">=</span> <span class="n">7143</span><span class="k2">;</span>
      zacc <span class="k3">+</span><span class="k3">=</span> z<span class="k2">;</span>
   <span class="k2">}</span>
   t1 <span class="k3">=</span> CPUTIME<span class="k2">;</span>
   <a href="http://www.delorie.com/djgpp/doc/libc/libc_624.html" target="_blank">printf</a><span class="k2">(</span><span class="s">"CPU time = %f secs.\n"</span>,t1-t0<span class="k2">)</span><span class="k2">;</span>
   <a href="http://www.delorie.com/djgpp/doc/libc/libc_624.html" target="_blank">printf</a><span class="k2">(</span><span class="s">"zacc = %d\n"</span>,zacc<span class="k2">)</span><span class="k2">;</span>
</pre></div></div><p>

the results are then:</p><div class="quote_container"><div class="title">Quote:</div><div class="quote"><p>
sizeof(long long) = 8, sizeof(fixed) = 4<br />Timing 200000000 calls to fixmulf...CPU time = 1.651749 secs.<br />zacc = -1382998619<br />Timing 200000000 calls to fixmuli...CPU time = 3.685439 secs.<br />zacc = -2117056946<br />Timing 200000000 calls to fixmull...CPU time = 3.353491 secs.<br />zacc = 516138025<br />Timing 200000000 calls to fixdivf...CPU time = 7.178909 secs.<br />zacc = -1716950756<br />Timing 200000000 calls to fixdivl...CPU time = 8.306736 secs.<br />zacc = -301320445
</p></div></div><p>
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (Peter Wang)</author>
		<pubDate>Tue, 17 May 2005 16:56:23 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><p>EDIT to earlier post, moved down due to Peter Wang comment in mean time:</p><p>I found a reason that your code is a very poor example when rifling through the assembly: the result of z is never used so gcc doesn&#39;t calculate it in some of the loops! Hence your fixmul loops aren&#39;t actually doing the multiplication and it is no wonder that you suddenly get the same results for everything!</p><p>I changed the z calculation to a += and added it as output to the printfs to make sure that it is actually calculated every loop.</p><p>Meaningful results (O3) are:</p><p>sizeof(long long) = 8, sizeof(fixed) = 4<br />Timing 100000000 calls to fixmulf...CPU time = 8.065064 secs.<br />Timing 100000000 calls to fixmuli...CPU time = 2.120938 secs.<br />Timing 100000000 calls to fixmull...CPU time = 1.212411 secs.<br />Timing 100000000 calls to fixdivf...CPU time = 16.796364 secs.<br />Timing 100000000 calls to fixdivl...CPU time = 19.028576 secs.</p><p>I shall look into that fixdivf vs fixdivl.</p><p>EDIT: so as not to confuse things, I ran Peter Wang&#39;s adaption. Results are:</p><p>O2:<br />sizeof(long long) = 8, sizeof(fixed) = 4<br />Timing 200000000 calls to fixmulf...CPU time = 16.082407 secs.<br />Timing 200000000 calls to fixmuli...CPU time = 3.940291 secs.<br />Timing 200000000 calls to fixmull...CPU time = 2.122170 secs.<br />Timing 200000000 calls to fixdivf...CPU time = 33.572767 secs.<br />Timing 200000000 calls to fixdivl...CPU time = 38.042508 secs.</p><p>No optimisations:<br />sizeof(long long) = 8, sizeof(fixed) = 4<br />Timing 200000000 calls to fixmulf...CPU time = 89.162385 secs.<br />Timing 200000000 calls to fixmuli...CPU time = 30.188200 secs.<br />Timing 200000000 calls to fixmull...CPU time = 23.056884 secs.<br />Timing 200000000 calls to fixdivf...CPU time = 115.471288 secs.<br />Timing 200000000 calls to fixdivl...CPU time = 63.658895 secs.
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (Thomas Harte)</author>
		<pubDate>Tue, 17 May 2005 16:57:24 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><p>Ah, you&#39;re quite right. I should have realized the compiler would optimize out an unused variable!<br />Without the zacc output removed, my results now look like this:</p><div class="quote_container"><div class="title">Quote:</div><div class="quote"><p>Xeon:<br />sizeof(long long) = 8, sizeof(fixed) = 4<br />Timing 100000000 calls to fixmulf...CPU time = 0.891864 secs.<br />Timing 100000000 calls to fixmuli...CPU time = 2.023693 secs.<br />Timing 100000000 calls to fixmull...CPU time = 2.681592 secs.<br />Timing 100000000 calls to fixdivf...CPU time = 3.426479 secs.<br />Timing 100000000 calls to fixdivl...CPU time = 3.107528 secs.</p><p>Celeron:<br />sizeof(long long) = 8, sizeof(fixed) = 4<br />Timing 100000000 calls to fixmulf...CPU time = 2.480000 secs.<br />Timing 100000000 calls to fixmuli...CPU time = 2.620000 secs.<br />Timing 100000000 calls to fixmull...CPU time = 2.880000 secs.<br />Timing 100000000 calls to fixdivf...CPU time = 13.010000 secs.<br />Timing 100000000 calls to fixdivl...CPU time = 8.570000 secs.</p><p>AMD64:<br />sizeof(long long) = 8, sizeof(fixed) = 4<br />Timing 100000000 calls to fixmulf...CPU time = 0.445932 secs.<br />Timing 100000000 calls to fixmuli...CPU time = 0.717891 secs.<br />Timing 100000000 calls to fixmull...CPU time = 0.227965 secs.<br />Timing 100000000 calls to fixdivf...CPU time = 1.306801 secs.<br />Timing 100000000 calls to fixdivl...CPU time = 4.045385 secs.
</p></div></div><p>

Basically, fixmull is a good idea on the AMD64, but a bad idea on the two Intel machines. For fixdivl, it is actually the other way around (!). I&#39;ll look at the detailed assembler output later on.<br />For now, another relevant statistic may be the gcc version involved. The celeron runs gcc 3.3.1, the AMD64 runs gcc 3.4.1 and the Xeon runs gcc 3.4.3.
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (Evert)</author>
		<pubDate>Tue, 17 May 2005 17:17:00 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><p>I&#39;ve got &quot;gcc version 4.0.0 20041026 (Apple Computer, Inc. build 4061)&quot;. Having been code digging, it appears that the 64bit divide calls some other function for a reason I don&#39;t fully understand yet given that the PowerPC has the divd 64 bit divide instruction. I&#39;m going to sniff around this topic and see if I can figure anything out.</p><p>EDIT: from a Mac vs PC viewpoint, is the fact that my 667Mhz G4 apparently more than twice as fast as the 950Mhz Celeron for multiplies (but less than half the speed for divides) surprising? In the sense that it may suggest problems with the test.
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (Thomas Harte)</author>
		<pubDate>Tue, 17 May 2005 17:28:49 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><div class="quote_container"><div class="title">Quote:</div><div class="quote"><p>from a Mac vs PC viewpoint, is the fact that my 667Mhz G4 apparently more than twice as fast as the 950Mhz Celeron for multiplies (but less than half the speed for divides) surprising?</p></div></div><p>
I don&#39;t think so. I used 300MHz Sun workstations that were faster for some calculations.</p><p>That said, it&#39;s quite possible I bodged up the test somewhere else too - I&#39;m not a computer professional, afterall, so it&#39;s possible that there is some benchmarking subtlety I&#39;m unaware of. It looks like it should be doing the right thing now though.
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (Evert)</author>
		<pubDate>Tue, 17 May 2005 22:30:42 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><p>This data is for MSVC 7.1 / Athlon XP 2500+</p><p>A few notes:</p><p>All of the different methods produce different results.  Most handle rounding differently from each other, and fixmulf handles overflow differently.  I suggest modifying the source to initialize x and y to some known value before each loop.  </p><p>Enabling SSE1 (which was disabled by default -this is MSVC 7.1) improves performance of fixmulf by 20%.  A simple by-hand translation of fixmull to inline asm improved performance by 10% despite the inefficiencies of inline asm on MSVC (you can&#39;t tell the compiler where you want stuff and what gets clobbbered and all that like you can in gcc).  </p><p>Performance scaled linearly on my computer for loop counts all the way down to 10^3, for all functions except fixmulf, which only scaled down to loop count = 10^5 (at 10^4 it took almost twice as long per loop, at 10^3 it took over 3 times as long per loop).  </p><div class="source-code snippet"><div class="inner"><pre>LOOP_COUNT <span class="k3">=</span> <span class="n">200000000</span>
total seconds
  <span class="n">2</span>.<span class="n">100045</span>    fixmulf
  <span class="n">2</span>.<span class="n">185031</span>    fixmuli
  <span class="n">1</span>.<span class="n">431395</span>    fixmull
  <span class="n">1</span>.<span class="n">278765</span>    fixmulasm
  <span class="n">7</span>.<span class="n">206338</span>    fixdivf
 <span class="n">12</span>.<span class="n">066815</span>    fixdivl
</pre></div></div><p>
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (orz)</author>
		<pubDate>Tue, 17 May 2005 23:26:23 +0000</pubDate>
	</item>
</rss>
