|
Primitive Addon ridiculously slow? |
EternalGames
Member #14,603
October 2012
|
Hey guys,
1void State_Playing::DrawMinimap()
2{
3 for(int i = 0; i<fieldWidth; i++)//fieldWidth is 40
4 {
5 for(int j = 0; j< fieldHeight; j++)//fieldHeight is 40
6 {
7 al_draw_filled_rectangle(screenWidth-130+i*3,screenHeight-130+j*3,screenWidth-130+i*3+2,screenHeight-130+j*3+2,GetColor(map.map.at(j+i*fieldWidth)));
8 }
9 }
10}
My question is firstly why the al_draw_filled_rectangle() function costs so much and secondly, what a better approach to draw the minimap would be. Thanks for any advice to come.
|
Kris Asick
Member #1,424
July 2001
|
The Primitives addon functions have a HUGE overhead to call, so the less you have to call them, the better. Don't give up hope though! There is a solution because I ran into a similar problem. That's how I learned about this in the first place. What you want to do is set up a massive number of triangles into an array of ALLEGRO_VERTEX structures, then simply make a single call to al_draw_prim() and feed it the array you've created using the ALLEGRO_PRIM_TRIANGLE_LIST mode. Yes, this means you'll have to draw rectangles as pairs of triangles. Your other option is to create a blank white texture bitmap and use al_draw_tinted_bitmap() or any of those functions in-between calls to al_hold_bitmap_drawing(true) and al_hold_bitmap_drawing(false). That should give you lightning-fast results as well. Just to warn you though: The CPU is the major hinderance to doing 2D hardware-accelerated graphics. Drawing 40x40 tiles per frame is 1600 drawing calls per frame. My AMD Athlon 64 2.0 GHz Dual-Core system can handle about 10,000 calls to al_draw_bitmap() functions per frame before the framerate dips. Yeah, you're still a good ways away from that, but you should determine what your limit is, what kind of system you're aiming to build your game for, and code accordingly. --- Kris Asick (Gemini) |
Arthur Kalliokoski
Second in Command
February 2005
|
Lots of times, slow rendering is the result of bitmaps getting created before the display, but that wouldn't apply to primitives, would it? Maybe you could try straight Open GL and use GL_TRIANGLES or whatever. They all watch too much MSNBC... they get ideas. |
SiegeLord
Member #7,827
October 2006
|
Kris Asick said: The Primitives addon functions have a HUGE overhead to call, so the less you have to call them, the better. I don't get where this comes from. Here's a simple benchmark comparing pure OpenGL and al_draw_rectangle: 1#include <allegro5/allegro.h>
2#include <allegro5/allegro_primitives.h>
3#include <allegro5/allegro_opengl.h>
4#include <stdio.h>
5
6const int N = 10000;
7const int M = 1000;
8
9int main()
10{
11 al_init();
12 al_init_primitives_addon();
13 auto d = al_create_display(800, 600);
14
15 ALLEGRO_VERTEX vtxs[4];
16 for(int ii = 0; ii < 4; ii++)
17 {
18 vtxs[ii].z = 0;
19 vtxs[ii].color = al_map_rgb_f(1, 0, 1);
20 }
21 vtxs[0].x = 10;
22 vtxs[0].y = 10;
23
24 vtxs[1].x = 10;
25 vtxs[1].y = 200;
26
27 vtxs[2].x = 200;
28 vtxs[2].y = 200;
29
30 vtxs[3].x = 200;
31 vtxs[3].y = 10;
32
33 double time;
34 time = al_get_time();
35
36 glEnableClientState(GL_COLOR_ARRAY);
37 glEnableClientState(GL_VERTEX_ARRAY);
38 glBindTexture(GL_TEXTURE_2D, 0);
39
40 glVertexPointer(3, GL_FLOAT, sizeof(ALLEGRO_VERTEX), &vtxs[0].x);
41 glColorPointer(4, GL_FLOAT, sizeof(ALLEGRO_VERTEX), &vtxs[0].color.r);
42
43 for(int ii = 0; ii < N; ii++)
44 {
45 al_clear_to_color(al_map_rgb_f(0.5, 0.5, 0.5));
46 for(int jj = 0; jj < M; jj++)
47 {
48 glDrawArrays(GL_TRIANGLE_FAN, 0, 4);
49 }
50 al_flip_display();
51 }
52
53 glDisableClientState(GL_COLOR_ARRAY);
54 glDisableClientState(GL_VERTEX_ARRAY);
55
56 printf("Pure OpenGL: %f s\n", al_get_time() - time);
57
58 time = al_get_time();
59
60 for(int ii = 0; ii < N; ii++)
61 {
62 al_clear_to_color(al_map_rgb_f(0.5, 0.5, 0.5));
63 for(int jj = 0; jj < M; jj++)
64 {
65 al_draw_filled_rectangle(10, 10, 200, 200, al_map_rgb_f(1, 1, 1));
66 }
67 al_flip_display();
68 }
69
70 printf("al_draw_filled_rectangle: %f s\n", al_get_time() - time);
71
72 return 0;
73}
Pure OpenGL: 42.320089 s al_draw_filled_rectangle: 54.432403 s So yes, there's some overhead, but it's not like it doubles the run time. It's not the primitives addon that's slow, it's OpenGL. "For in much wisdom is much grief: and he that increases knowledge increases sorrow."-Ecclesiastes 1:18 |
Thomas Fjellstrom
Member #476
June 2000
|
There is a lot of overhead in the setup to calls to driver functions. You want to minimize it by sending all data at once, rather than some for each rectangle. Bit of a pain, but thats how the hardware prefers to be used. -- |
Kris Asick
Member #1,424
July 2001
|
SiegeLord said: I don't get where this comes from. When I first tried using the primitives system for doing my tile mapping engine my framerate was about 2 FPS. Mind you, that was making thousands of calls to al_draw_prim() per frame, not al_draw_filled_rectangle(). Still, considering you can shuffle an entire array to al_draw_prim() in a single call, that's probably the fastest way to go about it. Also Siege, your test code is somewhat flawed since al_flip_display() will cause delays in the timing if your system is vsyncing by default. (Like my system. In fact, it's difficult to get it to NOT vsync, even when disabling it in my 3D card settings.) One other thought... Would making multiple calls to al_draw_filled_rectangle() actually get sped up if done while al_hold_bitmap_drawing() is in effect? --- Kris Asick (Gemini) |
SiegeLord
Member #7,827
October 2006
|
Kris Asick said: Also Siege, your test code is somewhat flawed since al_flip_display() will cause delays in the timing if your system is vsyncing by default. (Like my system. In fact, it's difficult to get it to NOT vsync, even when disabling it in my 3D card settings.) It is notoriously difficult to benchmark graphical things, since often times the main computation does not occur unless you flip the display. Ultimately you need to turn off vsync to get any reasonable timings. Quote: Would making multiple calls to al_draw_filled_rectangle() actually get sped up if done while al_hold_bitmap_drawing() is in effect? al_hold_bitmap_drawing() has nothing to do with the primitives addon. Currently the fastest way to draw dynamic data is as you explained in your post... giant array of triangles. The fastest way to draw static data is using a relatively new feature called vertex buffer. al_create_vertex_buffer/al_draw_vertex_buffer. Most of the slowdown seems to come from a call to _al_opengl_set_blender inside al_draw_prim... I don't know why it's necessary. "For in much wisdom is much grief: and he that increases knowledge increases sorrow."-Ecclesiastes 1:18 |
Elias
Member #358
May 2000
|
I get this: Pure OpenGL: 15.142891 al_draw_filled_rectangle: 15.121058 I almost feel like I did something wrong (vsync is off however, with vsync it would run for 24 hours). But it means in my case al_draw_filled_rectangle is exactly as fast as pure OpenGL. So there is basically no overhead (if something didn't go wrong with the benchmark in my case). This is all with glDrawArrays calls of course. Using vertex buffers it would be a lot faster. I think it would be interesting having a test for that as well. And in general, it would be useful if we had a folder in Allegro with lots of tests like these. They could then be run on various hardware after each commit and we'd immediately notice regressions. -- |
Thomas Fjellstrom
Member #476
June 2000
|
The overhead of the GL draw commands themselves probably outweigh any overhead we have in the primitive addon. -- |
Kris Asick
Member #1,424
July 2001
|
I don't doubt it, but the primary thing that speeds up al_draw_bitmap() is deferred drawing. If the primitives add-on doesn't do any deferred drawing then it's likely just as GPU/CPU expensive as running numerous al_draw_bitmap() commands without deferred drawing, due to the way hardware-accelerated draw operations are handled in the first place. Plus, al_draw_prim() is definitely doing something processor intensive because it doesn't take very many calls to it to kill the framerate, yet calling it just once per frame with a massive array of data is perfectly fine. --- Kris Asick (Gemini) |
Thomas Fjellstrom
Member #476
June 2000
|
I tried thinking of a way to do differed drawing properly with the primitive addon, but its incredibly hard, if not impossible to do right (and have it perform better than it does now). Kris Asick said: Plus, al_draw_prim() is definitely doing something processor intensive because it doesn't take very many calls to it to kill the framerate, yet calling it just once per frame with a massive array of data is perfectly fine. Two possibilities: I did some silly benchmarking on my laptop a while back, and any time some state was changed, or some drawing commands hit the driver, a lot of time was spent in MESA (libGL) and the driver (intel-drv i915) to do a lot of setup. Basically GL and the hw itself really prefers you blast it with as much data as you can give it in one go, rather than a bunch of little bits. -- |
SiegeLord
Member #7,827
October 2006
|
Kris Asick said: Plus, al_draw_prim() is definitely doing something processor intensive because it doesn't take very many calls to it to kill the framerate, yet calling it just once per frame with a massive array of data is perfectly fine. But the question is, would pure OpenGL be any better? So far the answer points to no. Incidentally, I did try to implement deferred drawing for primitives too, but there was no giant improvement that I could see. Heck, I didn't even see any improvement for the deferred drawing with bitmaps. As I said, these things are very hard to benchmark, and often times you're shooting in the dark. "For in much wisdom is much grief: and he that increases knowledge increases sorrow."-Ecclesiastes 1:18 |
Kris Asick
Member #1,424
July 2001
|
Back on topic... I just took another look at the original code and noticed that the colour value is pulled from a special GetColor() function. It's entirely possible the bottleneck for the drawing function is in there and not the fault of al_draw_filled_rectangle()... This would be especially true if GetColor() is trying to pull colour data from a video bitmap. --- Kris Asick (Gemini) |
Thomas Fjellstrom
Member #476
June 2000
|
SiegeLord said: Heck, I didn't even see any improvement for the deferred drawing with bitmaps. And you're sure you tested where all of the bitmaps were in one single texture vs each bitmap with their own texture? It makes a measurable difference on all hardware that I know of. -- |
EternalGames
Member #14,603
October 2012
|
Kris Asick said: Back on topic... I just took another look at the original code and noticed that the colour value is pulled from a special GetColor() function. It's entirely possible the bottleneck for the drawing function is in there and not the fault of al_draw_filled_rectangle()... Haha no, the GetColor function is actually extremely simple: ALLEGRO_COLOR State_Playing::GetColor(int type) { switch(type) { case Forest: return al_map_rgb(0,80,0); case Lava: return al_map_rgb(160,38,40); case Water: return al_map_rgb(0,0,100); case Mountain: return al_map_rgb(120,120,120); case Sand: return al_map_rgb(255,255,100); case Plains: return al_map_rgb(0,200,0); case Building: return al_map_rgb(0,0,0); } }
Anyways I changed my code so, that it saves the minimap as a single bitmap and now only has a single drawing call. The map will only be updated when needed and the position of Enemys will be just drawn separately on top, if I decide to implement that feature. This way my code is fast enough, got from 11 million CPU cycles to between 20 000 and 60 000 (sometimes more, but I guess that's because windows does some stuff in between)
|
Kris Asick
Member #1,424
July 2001
|
EternalGames said: Anyways I changed my code so, that it saves the minimap as a single bitmap and now only has a single drawing call. Always the best approach if your bitmap size isn't bigger than (or not much bigger than) the screen itself. --- Kris Asick (Gemini) |
EternalGames
Member #14,603
October 2012
|
Kris Asick said: Always the best approach if your bitmap size isn't bigger than (or not much bigger than) the screen itself.
What's a better approach for large bitmaps, because I happen to have saved the whole image for the map (not the minimap) in a texture (which is about 5000 x 5000 pixels) There is a better way to do it?
|
Trent Gamblin
Member #261
April 2000
|
You might want to break it up into sections. 5000 is quite big to some GPUs. I imagine there are some GPUs still in use that only support 4096x4096 textures, and every ALLEGRO_BITMAP is backed by a texture.
|
Kris Asick
Member #1,424
July 2001
|
Also consider how big this bitmap actually appears on-screen. If your mini-map is shown in a minature size then shrink it down accordingly so it takes up less video memory. (A 4096x4096 bitmap takes up 64 MB of video RAM!) It really depends on how this mini-map shows up on-screen in the first place and how much memory vs. processing power you want to use, 'cause there's millions of approaches you could take. Context and programming skill are both important in deciding which approach would be best. --- Kris Asick (Gemini) |
Raidho36
Member #14,628
October 2012
|
For common case, it is enough to pre-render entire map with orthographic projection (oh wait, we're talking about 2D, right?) into small bitmap and keep it as underlay, while drawing untis' icons on top of it. Fog of war is done by having another extra layer. If your terrain changes frequently, you can simply re-render changed portion into minimap underlay bitmap. |
|