Allegro.cc - Online Community

Allegro.cc Forums » Programming Questions » Primitive Addon ridiculously slow?

This thread is locked; no one can reply to it. rss feed Print
Primitive Addon ridiculously slow?
EternalGames
Member #14,603
October 2012

Hey guys,
I've been programming a minimap for a game I'm making and for the minimap I wanted to draw rectangles whose colors fit the terrain type, but somehow calling the following simple function costs me ELEVEN MILLION CPU cycles. Since this function is called 60 times per second this is absolutely unacceptable and I have no idea why this costs that many cycles. ???

#SelectExpand
1void State_Playing::DrawMinimap() 2{ 3 for(int i = 0; i<fieldWidth; i++)//fieldWidth is 40 4 { 5 for(int j = 0; j< fieldHeight; j++)//fieldHeight is 40 6 { 7 al_draw_filled_rectangle(screenWidth-130+i*3,screenHeight-130+j*3,screenWidth-130+i*3+2,screenHeight-130+j*3+2,GetColor(map.map.at(j+i*fieldWidth))); 8 } 9 } 10}

My question is firstly why the al_draw_filled_rectangle() function costs so much and secondly, what a better approach to draw the minimap would be.
The minimap has to be updated every time the player builds something and possibly later I'd want to make it so that it also shows the position of enemies, so it would be updated every frame.

Thanks for any advice to come. ;D

Kris Asick
Member #1,424
July 2001

The Primitives addon functions have a HUGE overhead to call, so the less you have to call them, the better.

Don't give up hope though! There is a solution because I ran into a similar problem. That's how I learned about this in the first place. What you want to do is set up a massive number of triangles into an array of ALLEGRO_VERTEX structures, then simply make a single call to al_draw_prim() and feed it the array you've created using the ALLEGRO_PRIM_TRIANGLE_LIST mode.

Yes, this means you'll have to draw rectangles as pairs of triangles. :P

Your other option is to create a blank white texture bitmap and use al_draw_tinted_bitmap() or any of those functions in-between calls to al_hold_bitmap_drawing(true) and al_hold_bitmap_drawing(false). That should give you lightning-fast results as well.

Just to warn you though: The CPU is the major hinderance to doing 2D hardware-accelerated graphics. Drawing 40x40 tiles per frame is 1600 drawing calls per frame. My AMD Athlon 64 2.0 GHz Dual-Core system can handle about 10,000 calls to al_draw_bitmap() functions per frame before the framerate dips. Yeah, you're still a good ways away from that, but you should determine what your limit is, what kind of system you're aiming to build your game for, and code accordingly. ;)

--- Kris Asick (Gemini)
--- http://www.pixelships.com

Arthur Kalliokoski
Second in Command
February 2005
avatar

Lots of times, slow rendering is the result of bitmaps getting created before the display, but that wouldn't apply to primitives, would it?

Maybe you could try straight Open GL and use GL_TRIANGLES or whatever.

“Throughout history, poverty is the normal condition of man. Advances which permit this norm to be exceeded — here and there, now and then — are the work of an extremely small minority, frequently despised, often condemned, and almost always opposed by all right-thinking people. Whenever this tiny minority is kept from creating, or (as sometimes happens) is driven out of a society, the people then slip back into abject poverty. This is known as "bad luck.”

― Robert A. Heinlein

SiegeLord
Member #7,827
October 2006
avatar

The Primitives addon functions have a HUGE overhead to call, so the less you have to call them, the better.

I don't get where this comes from. Here's a simple benchmark comparing pure OpenGL and al_draw_rectangle:

#SelectExpand
1#include <allegro5/allegro.h> 2#include <allegro5/allegro_primitives.h> 3#include <allegro5/allegro_opengl.h> 4#include <stdio.h> 5 6const int N = 10000; 7const int M = 1000; 8 9int main() 10{ 11 al_init(); 12 al_init_primitives_addon(); 13 auto d = al_create_display(800, 600); 14 15 ALLEGRO_VERTEX vtxs[4]; 16 for(int ii = 0; ii < 4; ii++) 17 { 18 vtxs[ii].z = 0; 19 vtxs[ii].color = al_map_rgb_f(1, 0, 1); 20 } 21 vtxs[0].x = 10; 22 vtxs[0].y = 10; 23 24 vtxs[1].x = 10; 25 vtxs[1].y = 200; 26 27 vtxs[2].x = 200; 28 vtxs[2].y = 200; 29 30 vtxs[3].x = 200; 31 vtxs[3].y = 10; 32 33 double time; 34 time = al_get_time(); 35 36 glEnableClientState(GL_COLOR_ARRAY); 37 glEnableClientState(GL_VERTEX_ARRAY); 38 glBindTexture(GL_TEXTURE_2D, 0); 39 40 glVertexPointer(3, GL_FLOAT, sizeof(ALLEGRO_VERTEX), &vtxs[0].x); 41 glColorPointer(4, GL_FLOAT, sizeof(ALLEGRO_VERTEX), &vtxs[0].color.r); 42 43 for(int ii = 0; ii < N; ii++) 44 { 45 al_clear_to_color(al_map_rgb_f(0.5, 0.5, 0.5)); 46 for(int jj = 0; jj < M; jj++) 47 { 48 glDrawArrays(GL_TRIANGLE_FAN, 0, 4); 49 } 50 al_flip_display(); 51 } 52 53 glDisableClientState(GL_COLOR_ARRAY); 54 glDisableClientState(GL_VERTEX_ARRAY); 55 56 printf("Pure OpenGL: %f s\n", al_get_time() - time); 57 58 time = al_get_time(); 59 60 for(int ii = 0; ii < N; ii++) 61 { 62 al_clear_to_color(al_map_rgb_f(0.5, 0.5, 0.5)); 63 for(int jj = 0; jj < M; jj++) 64 { 65 al_draw_filled_rectangle(10, 10, 200, 200, al_map_rgb_f(1, 1, 1)); 66 } 67 al_flip_display(); 68 } 69 70 printf("al_draw_filled_rectangle: %f s\n", al_get_time() - time); 71 72 return 0; 73}

Pure OpenGL: 42.320089 s
al_draw_filled_rectangle: 54.432403 s

So yes, there's some overhead, but it's not like it doubles the run time. It's not the primitives addon that's slow, it's OpenGL.

"For in much wisdom is much grief: and he that increases knowledge increases sorrow."-Ecclesiastes 1:18
[SiegeLord's Abode][Codes]:[DAllegro5]:[RustAllegro]

Thomas Fjellstrom
Member #476
June 2000
avatar

There is a lot of overhead in the setup to calls to driver functions. You want to minimize it by sending all data at once, rather than some for each rectangle. Bit of a pain, but thats how the hardware prefers to be used.

--
Thomas Fjellstrom - [website] - [email] - [Allegro Wiki] - [Allegro TODO]
"If you can't think of a better solution, don't try to make a better solution." -- weapon_S
"The less evidence we have for what we believe is certain, the more violently we defend beliefs against those who don't agree" -- https://twitter.com/neiltyson/status/592870205409353730

Kris Asick
Member #1,424
July 2001

SiegeLord said:

I don't get where this comes from.

When I first tried using the primitives system for doing my tile mapping engine my framerate was about 2 FPS. >:(

Mind you, that was making thousands of calls to al_draw_prim() per frame, not al_draw_filled_rectangle(). Still, considering you can shuffle an entire array to al_draw_prim() in a single call, that's probably the fastest way to go about it.

Also Siege, your test code is somewhat flawed since al_flip_display() will cause delays in the timing if your system is vsyncing by default. (Like my system. In fact, it's difficult to get it to NOT vsync, even when disabling it in my 3D card settings.)

One other thought... Would making multiple calls to al_draw_filled_rectangle() actually get sped up if done while al_hold_bitmap_drawing() is in effect?

--- Kris Asick (Gemini)
--- http://www.pixelships.com

SiegeLord
Member #7,827
October 2006
avatar

Also Siege, your test code is somewhat flawed since al_flip_display() will cause delays in the timing if your system is vsyncing by default. (Like my system. In fact, it's difficult to get it to NOT vsync, even when disabling it in my 3D card settings.)

It is notoriously difficult to benchmark graphical things, since often times the main computation does not occur unless you flip the display. Ultimately you need to turn off vsync to get any reasonable timings.

Quote:

Would making multiple calls to al_draw_filled_rectangle() actually get sped up if done while al_hold_bitmap_drawing() is in effect?

al_hold_bitmap_drawing() has nothing to do with the primitives addon.

Currently the fastest way to draw dynamic data is as you explained in your post... giant array of triangles. The fastest way to draw static data is using a relatively new feature called vertex buffer. al_create_vertex_buffer/al_draw_vertex_buffer.

Most of the slowdown seems to come from a call to _al_opengl_set_blender inside al_draw_prim... I don't know why it's necessary.

"For in much wisdom is much grief: and he that increases knowledge increases sorrow."-Ecclesiastes 1:18
[SiegeLord's Abode][Codes]:[DAllegro5]:[RustAllegro]

Elias
Member #358
May 2000

I get this:

Pure OpenGL: 15.142891
al_draw_filled_rectangle: 15.121058

I almost feel like I did something wrong (vsync is off however, with vsync it would run for 24 hours).

But it means in my case al_draw_filled_rectangle is exactly as fast as pure OpenGL. So there is basically no overhead (if something didn't go wrong with the benchmark in my case).

This is all with glDrawArrays calls of course. Using vertex buffers it would be a lot faster. I think it would be interesting having a test for that as well. And in general, it would be useful if we had a folder in Allegro with lots of tests like these. They could then be run on various hardware after each commit and we'd immediately notice regressions.

--
"Either help out or stop whining" - Evert

Thomas Fjellstrom
Member #476
June 2000
avatar

The overhead of the GL draw commands themselves probably outweigh any overhead we have in the primitive addon.

--
Thomas Fjellstrom - [website] - [email] - [Allegro Wiki] - [Allegro TODO]
"If you can't think of a better solution, don't try to make a better solution." -- weapon_S
"The less evidence we have for what we believe is certain, the more violently we defend beliefs against those who don't agree" -- https://twitter.com/neiltyson/status/592870205409353730

Kris Asick
Member #1,424
July 2001

I don't doubt it, but the primary thing that speeds up al_draw_bitmap() is deferred drawing. If the primitives add-on doesn't do any deferred drawing then it's likely just as GPU/CPU expensive as running numerous al_draw_bitmap() commands without deferred drawing, due to the way hardware-accelerated draw operations are handled in the first place.

Plus, al_draw_prim() is definitely doing something processor intensive because it doesn't take very many calls to it to kill the framerate, yet calling it just once per frame with a massive array of data is perfectly fine. :P

--- Kris Asick (Gemini)
--- http://www.pixelships.com

Thomas Fjellstrom
Member #476
June 2000
avatar

I tried thinking of a way to do differed drawing properly with the primitive addon, but its incredibly hard, if not impossible to do right (and have it perform better than it does now).

Plus, al_draw_prim() is definitely doing something processor intensive because it doesn't take very many calls to it to kill the framerate, yet calling it just once per frame with a massive array of data is perfectly fine.

Two possibilities:
1. al_draw_prim does some pre-processing for the matrix
2. each al_draw_prim does its own gl vertexarray setup and teardown (the functions that hit the GL driver, and the card).

I did some silly benchmarking on my laptop a while back, and any time some state was changed, or some drawing commands hit the driver, a lot of time was spent in MESA (libGL) and the driver (intel-drv i915) to do a lot of setup.

Basically GL and the hw itself really prefers you blast it with as much data as you can give it in one go, rather than a bunch of little bits.

--
Thomas Fjellstrom - [website] - [email] - [Allegro Wiki] - [Allegro TODO]
"If you can't think of a better solution, don't try to make a better solution." -- weapon_S
"The less evidence we have for what we believe is certain, the more violently we defend beliefs against those who don't agree" -- https://twitter.com/neiltyson/status/592870205409353730

SiegeLord
Member #7,827
October 2006
avatar

Plus, al_draw_prim() is definitely doing something processor intensive because it doesn't take very many calls to it to kill the framerate, yet calling it just once per frame with a massive array of data is perfectly fine. :P

But the question is, would pure OpenGL be any better? So far the answer points to no.

Incidentally, I did try to implement deferred drawing for primitives too, but there was no giant improvement that I could see. Heck, I didn't even see any improvement for the deferred drawing with bitmaps. As I said, these things are very hard to benchmark, and often times you're shooting in the dark.

"For in much wisdom is much grief: and he that increases knowledge increases sorrow."-Ecclesiastes 1:18
[SiegeLord's Abode][Codes]:[DAllegro5]:[RustAllegro]

Kris Asick
Member #1,424
July 2001

Back on topic... I just took another look at the original code and noticed that the colour value is pulled from a special GetColor() function. It's entirely possible the bottleneck for the drawing function is in there and not the fault of al_draw_filled_rectangle()...

This would be especially true if GetColor() is trying to pull colour data from a video bitmap.

--- Kris Asick (Gemini)
--- http://www.pixelships.com

Thomas Fjellstrom
Member #476
June 2000
avatar

SiegeLord said:

Heck, I didn't even see any improvement for the deferred drawing with bitmaps.

And you're sure you tested where all of the bitmaps were in one single texture vs each bitmap with their own texture? It makes a measurable difference on all hardware that I know of.

--
Thomas Fjellstrom - [website] - [email] - [Allegro Wiki] - [Allegro TODO]
"If you can't think of a better solution, don't try to make a better solution." -- weapon_S
"The less evidence we have for what we believe is certain, the more violently we defend beliefs against those who don't agree" -- https://twitter.com/neiltyson/status/592870205409353730

EternalGames
Member #14,603
October 2012

Back on topic... I just took another look at the original code and noticed that the colour value is pulled from a special GetColor() function. It's entirely possible the bottleneck for the drawing function is in there and not the fault of al_draw_filled_rectangle()...

Haha no, the GetColor function is actually extremely simple:

ALLEGRO_COLOR State_Playing::GetColor(int type)
{
  switch(type)
  {
  case Forest: return al_map_rgb(0,80,0);
  case Lava: return al_map_rgb(160,38,40);
  case Water: return al_map_rgb(0,0,100);
  case Mountain: return al_map_rgb(120,120,120);
  case Sand: return al_map_rgb(255,255,100);
  case Plains: return al_map_rgb(0,200,0);
  case Building: return al_map_rgb(0,0,0);
  }
}

Anyways I changed my code so, that it saves the minimap as a single bitmap and now only has a single drawing call. The map will only be updated when needed and the position of Enemys will be just drawn separately on top, if I decide to implement that feature.
It now basically boils down to this:
al_draw_bitmap(miniMap,screenWidth-130,screenHeight-130,0);
(I'm wondering if I really should ALWAYS put any code in these code tags... the formatting of the code looks really good, though..)

This way my code is fast enough, got from 11 million CPU cycles to between 20 000 and 60 000 (sometimes more, but I guess that's because windows does some stuff in between) ;D

Kris Asick
Member #1,424
July 2001

Anyways I changed my code so, that it saves the minimap as a single bitmap and now only has a single drawing call.

Always the best approach if your bitmap size isn't bigger than (or not much bigger than) the screen itself. ;)

--- Kris Asick (Gemini)
--- http://www.pixelships.com

EternalGames
Member #14,603
October 2012

Always the best approach if your bitmap size isn't bigger than (or not much bigger than) the screen itself. ;)

What's a better approach for large bitmaps, because I happen to have saved the whole image for the map (not the minimap) in a texture (which is about 5000 x 5000 pixels)
and draw it using al_draw_bitmap_region();

There is a better way to do it? ???

Trent Gamblin
Member #261
April 2000
avatar

You might want to break it up into sections. 5000 is quite big to some GPUs. I imagine there are some GPUs still in use that only support 4096x4096 textures, and every ALLEGRO_BITMAP is backed by a texture.

Kris Asick
Member #1,424
July 2001

Also consider how big this bitmap actually appears on-screen. If your mini-map is shown in a minature size then shrink it down accordingly so it takes up less video memory. (A 4096x4096 bitmap takes up 64 MB of video RAM!)

It really depends on how this mini-map shows up on-screen in the first place and how much memory vs. processing power you want to use, 'cause there's millions of approaches you could take. Context and programming skill are both important in deciding which approach would be best.

--- Kris Asick (Gemini)
--- http://www.pixelships.com

Raidho36
Member #14,628
October 2012
avatar

For common case, it is enough to pre-render entire map with orthographic projection (oh wait, we're talking about 2D, right?) into small bitmap and keep it as underlay, while drawing untis' icons on top of it. Fog of war is done by having another extra layer.

If your terrain changes frequently, you can simply re-render changed portion into minimap underlay bitmap.

Go to: