<?xml version="1.0"?>
<rss version="2.0">
	<channel>
		<title>Vertex Buffers - Allegro Primitives mach_msg_trap</title>
		<link>http://www.allegro.cc/forums/view/617045</link>
		<description>Allegro.cc Forum Thread</description>
		<webMaster>matthew@allegro.cc (Matthew Leverton)</webMaster>
		<lastBuildDate>Mon, 18 Sep 2017 05:47:16 +0000</lastBuildDate>
	</channel>
	<item>
		<description><![CDATA[<div class="mockup v2"><p>(sorry this will be slightly long...)</p><p><b>Brief background</b><br />I&#39;m trying to help optimise an allegro programme that uses the primitives add on for drawing, all desired outputs are displayed the problem is performance not functionality.</p><p>In running profiling key slow downs were found in a function that creates a vertex buffer, it would do so by calling al_create_vertex_buffer (providing no initial data just specifying the number of vertices) then al_lock_vertex_buffer, then writing the vertex data to the buffer and then al_unlock_vertex_buffer.</p><p>The call to al_unlock_vertex_buffer had a cost 8000 times that of anything else according to my profiler (Instruments on macOS); it was approximately 80% of our total execution time; unfolding the stack trace showed that the opengl calls underneath this always ended up at mach_msg_trap which was where all the time was being spent.</p><p><b>Initial fix attempted</b><br />I tried rewriting the function to write the vertex data to the stack as a row of ALLEGRO_VERTEX structs and then calling al_create_vertex_buffer and providing that row and therefore not needing the lock and unlock.</p><p>This took away all the delay from al_create_vertex_buffer BUT added that same delay to the first time the buffer was used for a drawing operation (only the first time though, the speed was fine for subsequent draws) - this time the profiler showed that the first draw operation was ending up with mach_msg_trap and not the buffer creation.</p><p><b>Next idea</b><br />I&#39;ll be honest I know almost nothing about using opengl, I tried reading a basic tutorial and it mentioned using glEnableClientState(GL_VERTEX_ARRAY) after creating and before using a buffer, as an experiment I tried adding this into prim_opengl.c on line 670, the slow down vanished almost entirely and mach_msg_trap was no longer appearing in the profile. However the length of each draw call was about 10% longer than it had been without this command there (still meant total execution time was less than half what it had been before but it didn&#39;t look right that the draws were taking longer)</p><p>A bit more reading tells me that glEnableClientState is deprecated and is meant to have been replaced by glEnableVertexAttribArray which is called before the drawing operation by setup_state within prim_opengl.c.</p><p><b>Current thoughts</b><br />It seems to me that for whatever reason the first time glEnableVertexAttribArray is called for a given array it doesn&#39;t seem to work properly in this context - some printfs showed me that the correct/expected path was being taken through setup_state.</p><p><b>Questions</b><br />1. Has anyone seen an issue like this before? (google found me nothing)<br />2. Any ideas why this may be happening?</p><p>(If relevant I&#39;m doing this testing on a macbook pro with an Intel Iris Pro graphics card)</p><p><b>Code</b><br />Original version of vbo_upload function
</p><div class="source-code"><div class="toolbar"><span class="button numbers"><b>#</b></span><span class="button select">Select</span><span class="button expand">Expand</span></div><div class="inner"><span class="number">  1</span><span class="k1">bool</span>
<span class="number">  2</span>vbo_upload<span class="k2">(</span>vbo_t<span class="k3">*</span> it<span class="k2">)</span>
<span class="number">  3</span><span class="k2">{</span>
<span class="number">  4</span>  ALLEGRO_VERTEX_BUFFER<span class="k3">*</span> buffer<span class="k2">;</span>
<span class="number">  5</span>  <a href="http://www.allegro.cc/manual/ALLEGRO_VERTEX"><span class="a">ALLEGRO_VERTEX</span></a><span class="k3">*</span>        entries<span class="k2">;</span>
<span class="number">  6</span>  vertex_t<span class="k3">*</span>              vertex<span class="k2">;</span>
<span class="number">  7</span>
<span class="number">  8</span>  iter_t iter<span class="k2">;</span>
<span class="number">  9</span>
<span class="number"> 10</span>  <span class="k1">if</span> <span class="k2">(</span>it-&gt;buffer <span class="k3">!</span><span class="k3">=</span> NULL<span class="k2">)</span> <span class="k2">{</span>
<span class="number"> 11</span>    al_destroy_vertex_buffer<span class="k2">(</span>it-&gt;buffer<span class="k2">)</span><span class="k2">;</span>
<span class="number"> 12</span>    it-&gt;buffer <span class="k3">=</span> NULL<span class="k2">;</span>
<span class="number"> 13</span>  <span class="k2">}</span>
<span class="number"> 14</span>
<span class="number"> 15</span>  <span class="c">// create the vertex buffer object</span>
<span class="number"> 16</span>  <span class="k1">if</span> <span class="k2">(</span><span class="k3">!</span><span class="k2">(</span>buffer <span class="k3">=</span> al_create_vertex_buffer<span class="k2">(</span>NULL, NULL, vector_len<span class="k2">(</span>it-&gt;vertices<span class="k2">)</span>, ALLEGRO_PRIM_BUFFER_STATIC<span class="k2">)</span><span class="k2">)</span><span class="k2">)</span>
<span class="number"> 17</span>    <span class="k1">return</span> <span class="k1">false</span><span class="k2">;</span>
<span class="number"> 18</span>
<span class="number"> 19</span>  <span class="c">// upload indices to the GPU</span>
<span class="number"> 20</span>  <span class="k1">if</span> <span class="k2">(</span><span class="k3">!</span><span class="k2">(</span>entries <span class="k3">=</span> al_lock_vertex_buffer<span class="k2">(</span>buffer, <span class="n">0</span>, vector_len<span class="k2">(</span>it-&gt;vertices<span class="k2">)</span>, ALLEGRO_LOCK_WRITEONLY<span class="k2">)</span><span class="k2">)</span><span class="k2">)</span> <span class="k2">{</span>
<span class="number"> 21</span>    al_destroy_vertex_buffer<span class="k2">(</span>buffer<span class="k2">)</span><span class="k2">;</span>
<span class="number"> 22</span>    <span class="k1">return</span> <span class="k1">false</span><span class="k2">;</span>
<span class="number"> 23</span>  <span class="k2">}</span>
<span class="number"> 24</span>  iter <span class="k3">=</span> vector_enum<span class="k2">(</span>it-&gt;vertices<span class="k2">)</span><span class="k2">;</span>
<span class="number"> 25</span>  <span class="k1">while</span> <span class="k2">(</span>iter_next<span class="k2">(</span><span class="k3">&amp;</span>iter<span class="k2">)</span><span class="k2">)</span> <span class="k2">{</span>
<span class="number"> 26</span>    vertex <span class="k3">=</span> iter.ptr<span class="k2">;</span>
<span class="number"> 27</span>    entries<span class="k2">[</span>iter.index<span class="k2">]</span>.x <span class="k3">=</span> vertex-&gt;x<span class="k2">;</span>
<span class="number"> 28</span>    entries<span class="k2">[</span>iter.index<span class="k2">]</span>.y <span class="k3">=</span> vertex-&gt;y<span class="k2">;</span>
<span class="number"> 29</span>    entries<span class="k2">[</span>iter.index<span class="k2">]</span>.z <span class="k3">=</span> vertex-&gt;z<span class="k2">;</span>
<span class="number"> 30</span>    entries<span class="k2">[</span>iter.index<span class="k2">]</span>.u <span class="k3">=</span> vertex-&gt;u<span class="k2">;</span>
<span class="number"> 31</span>    entries<span class="k2">[</span>iter.index<span class="k2">]</span>.v <span class="k3">=</span> vertex-&gt;v<span class="k2">;</span>
<span class="number"> 32</span>    entries<span class="k2">[</span>iter.index<span class="k2">]</span>.color <span class="k3">=</span> nativecolor<span class="k2">(</span>vertex-&gt;color<span class="k2">)</span><span class="k2">;</span>
<span class="number"> 33</span>  <span class="k2">}</span>
<span class="number"> 34</span>  al_unlock_vertex_buffer<span class="k2">(</span>buffer<span class="k2">)</span><span class="k2">;</span> <span class="c">//&lt;-all delay was here</span>
<span class="number"> 35</span>
<span class="number"> 36</span>  it-&gt;buffer <span class="k3">=</span> buffer<span class="k2">;</span>
<span class="number"> 37</span>  <span class="k1">return</span> <span class="k1">true</span><span class="k2">;</span>
<span class="number"> 38</span><span class="k2">}</span>
</div></div><p>

Re-written vbo_upload (defers delay to first draw):
</p><div class="source-code"><div class="toolbar"><span class="button numbers"><b>#</b></span><span class="button select">Select</span><span class="button expand">Expand</span></div><div class="inner"><span class="number">  1</span><span class="k1">bool</span>
<span class="number">  2</span>vbo_upload<span class="k2">(</span>vbo_t<span class="k3">*</span> it<span class="k2">)</span>
<span class="number">  3</span><span class="k2">{</span>
<span class="number">  4</span>  ALLEGRO_VERTEX_BUFFER<span class="k3">*</span> buffer<span class="k2">;</span>
<span class="number">  5</span>  vertex_t<span class="k3">*</span>              vertex<span class="k2">;</span>
<span class="number">  6</span>
<span class="number">  7</span>  iter_t iter<span class="k2">;</span>
<span class="number">  8</span>  
<span class="number">  9</span>  <a href="http://www.allegro.cc/manual/ALLEGRO_VERTEX"><span class="a">ALLEGRO_VERTEX</span></a> vertices<span class="k2">[</span>vector_len<span class="k2">(</span>it-&gt;vertices<span class="k2">)</span><span class="k2">]</span><span class="k2">;</span>
<span class="number"> 10</span>
<span class="number"> 11</span>  iter <span class="k3">=</span> vector_enum<span class="k2">(</span>it-&gt;vertices<span class="k2">)</span><span class="k2">;</span>
<span class="number"> 12</span>  <span class="k1">while</span> <span class="k2">(</span>iter_next<span class="k2">(</span><span class="k3">&amp;</span>iter<span class="k2">)</span><span class="k2">)</span> <span class="k2">{</span>
<span class="number"> 13</span>    vertex <span class="k3">=</span> iter.ptr<span class="k2">;</span>
<span class="number"> 14</span>    vertices<span class="k2">[</span>iter.index<span class="k2">]</span>.x <span class="k3">=</span> vertex-&gt;x<span class="k2">;</span>
<span class="number"> 15</span>    vertices<span class="k2">[</span>iter.index<span class="k2">]</span>.y <span class="k3">=</span> vertex-&gt;y<span class="k2">;</span>
<span class="number"> 16</span>    vertices<span class="k2">[</span>iter.index<span class="k2">]</span>.z <span class="k3">=</span> vertex-&gt;z<span class="k2">;</span>
<span class="number"> 17</span>    vertices<span class="k2">[</span>iter.index<span class="k2">]</span>.u <span class="k3">=</span> vertex-&gt;u<span class="k2">;</span>
<span class="number"> 18</span>    vertices<span class="k2">[</span>iter.index<span class="k2">]</span>.v <span class="k3">=</span> vertex-&gt;v<span class="k2">;</span>
<span class="number"> 19</span>    vertices<span class="k2">[</span>iter.index<span class="k2">]</span>.color <span class="k3">=</span> nativecolor<span class="k2">(</span>vertex-&gt;color<span class="k2">)</span><span class="k2">;</span>
<span class="number"> 20</span>  <span class="k2">}</span>
<span class="number"> 21</span>  
<span class="number"> 22</span>  <span class="k1">if</span> <span class="k2">(</span>it-&gt;buffer <span class="k3">!</span><span class="k3">=</span> NULL<span class="k2">)</span> <span class="k2">{</span>
<span class="number"> 23</span>    al_destroy_vertex_buffer<span class="k2">(</span>it-&gt;buffer<span class="k2">)</span><span class="k2">;</span>
<span class="number"> 24</span>    it-&gt;buffer <span class="k3">=</span> NULL<span class="k2">;</span>
<span class="number"> 25</span>  <span class="k2">}</span>
<span class="number"> 26</span>
<span class="number"> 27</span>  <span class="c">// create the vertex buffer object</span>
<span class="number"> 28</span>  <span class="k1">if</span> <span class="k2">(</span><span class="k3">!</span><span class="k2">(</span>buffer <span class="k3">=</span> al_create_vertex_buffer<span class="k2">(</span>NULL, vertices, vector_len<span class="k2">(</span>it-&gt;vertices<span class="k2">)</span>, ALLEGRO_PRIM_BUFFER_STATIC<span class="k2">)</span><span class="k2">)</span><span class="k2">)</span>
<span class="number"> 29</span>    <span class="k1">return</span> <span class="k1">false</span><span class="k2">;</span>
<span class="number"> 30</span>
<span class="number"> 31</span>  it-&gt;buffer <span class="k3">=</span> buffer<span class="k2">;</span>
<span class="number"> 32</span>  <span class="k1">return</span> <span class="k1">true</span><span class="k2">;</span>
<span class="number"> 33</span><span class="k2">}</span>
</div></div><p>

The draw is done using:<br />al_draw_vertex_buffer(vbo_buffer(shape-&gt;vbo), bitmap, 0, num_vertices, draw_mode);</p><p>num_vertices will be the number of vertices used when creating the buffer, bitmap will be a separately specified image to texture the shape with and vbo_buffer simply returns the relevant buffer.</p><p>The buffer is not edited by anything else.
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (rhuanjl)</author>
		<pubDate>Sat, 16 Sep 2017 14:54:40 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><p>ALLEGRO_PRIM_BUFFER_STATIC doesn&#39;t seem right to me,  ALLEGRO_PRIM_BUFFER_STREAM or the other flags might work better.
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (beoran)</author>
		<pubDate>Sat, 16 Sep 2017 21:45:48 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><div class="quote_container"><div class="title"><a href="http://www.allegro.cc/forums/thread/617045/1032559#target">beoran</a> said:</div><div class="quote"><p>ALLEGRO_PRIM_BUFFER_STATIC doesn&#39;t seem right to me, ALLEGRO_PRIM_BUFFER_STREAM or the other flags might work better.</p></div></div><p>
Thanks for the suggestion I&#39;ve just tried it unfortunately changing flags did not seem to produce any gain.</p><p>I note that the intention is only to write to any given buffer once in the function vbo_upload shown below but then to draw it many times; hence the initial choice of ALLEGRO_PRIM_BUFFER_STATIC.
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (rhuanjl)</author>
		<pubDate>Sat, 16 Sep 2017 22:45:50 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><p>Hmmm, it seems like this could be an Allegro performance bug on osx. Perhaps the opengl solution you worked out could be helpful in fixing this.</p><p>However, the mach_msg_trap seems to be a bit of a red herring:<br /><a href="https://stackoverflow.com/questions/1488601/how-to-find-out-what-mach-msg-trap-waits-for">https://stackoverflow.com/questions/1488601/how-to-find-out-what-mach-msg-trap-waits-for</a><br /><a href="https://stackoverflow.com/questions/7945016/how-to-optimize-mach-msg-trap">https://stackoverflow.com/questions/7945016/how-to-optimize-mach-msg-trap</a>
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (beoran)</author>
		<pubDate>Sun, 17 Sep 2017 11:06:50 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><p><b>beoran:</b> thanks for looking at this for me.</p><p>Some further googling and reading around suggests that I&#39;m not the only person who&#39;s had problems with openGL code that uses glEnableVertexAttribArray when used on macOS - and it does sound like it&#39;s probably a macOS specific issue.</p><p>But what I can&#39;t see anywhere is a solution. Thankfully the code runs and as it&#39;s only one delay per VBO it&#39;s not disastrous - can still create a 4 vertex VBO in 0.15 milliseconds, I just think that it should take more like 0.04 or so. I should probably test higher vertex count cases and see if it becomes a more significant issue.</p><p>And I suppose if I want a fix I need to read some openGL 3/4 macOS specific guides.
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (rhuanjl)</author>
		<pubDate>Sun, 17 Sep 2017 13:08:20 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><p>I am confused by your setup a bit. Are you continually creating a vertex buffers? They are meant to be created once and reused multiple times.
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (SiegeLord)</author>
		<pubDate>Mon, 18 Sep 2017 04:23:09 +0000</pubDate>
	</item>
	<item>
		<description><![CDATA[<div class="mockup v2"><p>@SiegeLord: I wouldn&#39;t normally be continually creating VBOs I&#39;m well aware that they&#39;re designed to be made once and used many times.</p><p>For testing performance I had the code create 20,000 VBOs then draw each of them 10 times which is where the time measurements I&#39;ve mentioned come from.</p><p>On the macbook pro I&#39;m using creating the 20,000 VBOs takes 3-3.5 seconds (with the original version of the function) around 0.5 seconds with the edited version.</p><p>With the original version the draw operations take about 1 second (all 200,000 draws), with the edited version the first draw operation for each VBO (i.e. the first 20,000 operations) collectively take 3-4 seconds, with the remaining 180,000 taking &lt; 1 second.</p><p>Conversely if I added in glEnableClientState(GL_VERTEX_ARRAY) to the relevant line within al_create_vertex_buffer as well as swapping to the alternate loading function the creation process dropped to 0.5 seconds and all the draws together took around 1.1 seconds.
</p></div>]]>
		</description>
		<author>no-reply@allegro.cc (rhuanjl)</author>
		<pubDate>Mon, 18 Sep 2017 05:47:16 +0000</pubDate>
	</item>
</rss>
