Speeding up drawing from a tile atlas.

Speeding up drawing from a tile atlas.

Dakota West

Member #15,138

May 2013

Uusually when I have a question I can find an answer in the docs or on the forums because someone asked it before me, but now I'm kind of stumped. I've been browsing the forums looking for solutions for a while, and I'm nearly positive the error is in my code and not the system because a few months I implemented something similar that had no performance issues at all using the same version of allegro (5.0.8).

I'm drawing to the screen for a tile atlas, organized into an array of sub-bitmaps. It is extremely slow. Like, 15/16fps slow on windows and 26fps slow on Arch. This is on a machine that can run Battlefield 3 on Ultra at 60fps just to put in perspective exactly how wrong I went somewhere in my code.

I draw to the screen with this block

#SelectExpand
  1al_hold_bitmap_drawing(true);
  2for (int i = x; i < x + tiwi; i++)
  3        for (int j = y; j < y + tihi; j++) {
  4                fg = ch->fore[i][j];
  5                bg = ch->back[i][j];
  6                block_draw(fg, bg, (i - x) * scale,
  7                           (j - y) * scale, scale);
  8        }
  9al_hold_bitmap_drawing(false);

x and y indicate position on the map, and tihi abbreviates "tiles high" (screen size) and tiwi abbreviates "tiles wide". fg and bg are numbers, an id for what tile to draw, and then scale is the size to draw them at.

The function that calls is below.

#SelectExpand
  1void block_draw (block fg, block bg, int x, int y, int scale)
  2{
      if ((profiles[fg.id].opacity < BLOCK_OPAQUE) && bg.id) {
              al_draw_scaled_bitmap(backs[bg.id], 0, 0, BLOCK_SIZE,
                                    BLOCK_SIZE, x, y, scale, scale, 0);
              if (fg.id)
                      al_draw_tinted_scaled_bitmap(sprites[fg.id],
                              al_map_rgba_f(1, 1, 1, profiles[fg.id].opacity),
                              0, 0, BLOCK_SIZE, BLOCK_SIZE, x, y, scale,
                              scale, 0);
      } else
              al_draw_scaled_bitmap(sprites[fg.id], 0, 0, BLOCK_SIZE,
                                    BLOCK_SIZE, x, y, scale, scale, 0);
 14}

Currently all the blocks I've been testing with go straight to the "else" clause in block_draw.

As for modes, I took some suggestions I found on the forums and implemented flags that should help.

#SelectExpand
  1int init ()
  2{
  3        if(!al_init())
  4                return 1;
  5
  6        al_init_image_addon();
  7        al_install_keyboard();
  8        al_init_font_addon();
  9        al_init_ttf_addon();
 10        al_set_new_display_flags(ALLEGRO_OPENGL);
 11        al_set_new_bitmap_flags(ALLEGRO_VIDEO_BITMAP);
 12
 13        if (enet_initialize())
 14                return 2;
 15
 16        atexit(enet_deinitialize);
 17
 18        return 0;
 19}

Without the ALLEGRO_OPENGL flag I was getting about a frame a minute, so that's really helpful.

The bitmaps called in block_draw are loaded with this.

#SelectExpand
  1static ALLEGRO_BITMAP *sheet;
  2static ALLEGRO_BITMAP *sprites[BLOCK_COUNT];
  3static ALLEGRO_BITMAP *backs[BLOCK_COUNT];
  4
  5...
  6
  7if (!(sheet = al_load_bitmap(name)))
  8        return 1;
  9
 10ALLEGRO_BITMAP *temp = al_get_target_bitmap();
 11ALLEGRO_BITMAP *map = al_create_bitmap(BLOCK_COUNT * BLOCK_SIZE,
 12                                       BLOCK_SIZE * 2);
 13al_set_target_bitmap(map);
 14al_draw_bitmap(sheet, 0, 0, 0);
 15al_draw_tinted_bitmap(sheet, al_map_rgb(100, 100, 100), 0, BLOCK_SIZE, 0);
 16
 17if (temp)
 18        al_set_target_bitmap(temp);
 19
 20al_destroy_bitmap(sheet);
 21sheet = map;
 22
 23memset(sprites, 0, sizeof(ALLEGRO_BITMAP*) * BLOCK_COUNT);
 24for (int i = 0; i < BLOCK_COUNT; i++)
 25        sprites[i] = al_create_sub_bitmap(sheet, i * BLOCK_SIZE, 0,
 26                                          BLOCK_SIZE, BLOCK_SIZE);
 27
 28memset(backs, 0, sizeof(ALLEGRO_BITMAP*) * BLOCK_COUNT);
 29for (int i = 0; i < BLOCK_COUNT; i++)
 30        backs[i] = al_create_sub_bitmap(sheet, i * BLOCK_SIZE,
 31                                        BLOCK_SIZE,
 32                                        BLOCK_SIZE, BLOCK_SIZE);

I am going to continue testing things I'm reading in docs/forums, and I will post back on the thread if I find anything in case other people have this problem in the future.

Jeff Bernard

Member #6,698

December 2005

Do you have the same performance problem if you use al_draw_bitmap_region on the atlas instead of creating sub bitmaps? (You can use al_scale_transform/al_use_transform if it's crucial you have some non-one scale.)

--
I thought I was wrong once, but I was mistaken.

Dakota West

Member #15,138

May 2013

No, using the draw region doesn't change performance in any measurable way. I will test the transformation method. Since all the tiles drawn in a given frame would be scaled to the same way, I figure I set it for the target bitmap and then unset it once everything is drawn?

ph03nix

Member #15,028

April 2013

I find this suspicious:

void block_draw (block fg, block bg, int x, int y, int scale){

It seems you are passing the block objects by value instead of reference or as a pointer, meaning you are making copies of the block objects. If you do this hundreds of times per iteration, it could cause your program to slow down (and even malfunction)

Dakota West

Member #15,138

May 2013

ph03nix said:

The block struct currently only has one field, which is an unsigned 8bit integer, so I'm not sure that's the issue, but I'm running a test now.

Damn, no difference.

ph03nix

Member #15,028

April 2013

Quite the mystery then, because I see nothing wrong with the code. Does it still run slowly if you draw everything but the tiles? (assuming you draw other things as well)

Raidho36

Member #14,628

October 2012

~~To me it seems like biggest reason for this thing to work slow.~~ Use pointers anyway.

Also, make sure you don't call your render functions too much. ~~Do not assume that you can simply render entire scene at once just because it will get clipped to display area.~~ That was already there, haven't noticed, sorry. Anyway, even something as simple as rendering 64x64 bitmaps grid @60 fps causes big time performance problems.

Being serious is stupid, I'm done with it.

Dakota West

Member #15,138

May 2013

The bitmaps are 64x64 pixels naturally, but I want their render size to be adjustable. Is it still hard to render them even when resized because their natural resolution is 64x64? For context, it renders 30 tiles across, and 10 tiles down (although I hoped that when I figured this out, I could render a bigger space).

I will experiment with smaller textures.

And ph03nix no, I am only drawing the tiles. It's only a map viewer.

ph03nix

Member #15,028

April 2013

Rendering 300 64x64 bitmaps should not slow down to 15 fps from 60, or be slow at all. Perhaps you are somehow converting the bitmaps to memory bitmaps. Try calling al_get_bitmap_flags on your tiles to see if they are memory bitmaps

Dakota West

Member #15,138

May 2013

@ph03nix You got it. They are somehow becoming memory bitmaps. I don't know how though. I will update soon.

Update: The tile atlas was loaded after setting the new bitmap flags to ALLEGRO_VIDEO_BITMAP, but before creating the display. I missed something really obvious; sorry for wasting your time guys. I can now display them at their natural size, in an array of 100x200 no problem, as well as export to a 500x500 tile image in under a second.

Raidho36

Member #14,628

October 2012

I meant a grid of bitmaps sized 64x64, of arbitrary bitmaps, that is rendering over 4000 bitmaps at once. Even such relatively small tile grid turns out to be a huge deal. That was real surprise to me, since my computer ran modern games at "ultra" just fine, I was really expected it to much the whole thing like it's nothing. And that's with very plain loop, all there was is incrementing values and drawing "as is", with no computations going along. Now that I think of it, I probably had memory bitmaps, too. Still, if you have a lot of small tiles you're more likely to be drawing too many of them, in terms of computations per tile. For that part I really suggest you to use transformations, as you do a lot of computing within the loop only to do what transformaton does. Even if your video card render it faster than you can supply it, that'll still save you some processing.

Being serious is stupid, I'm done with it.

Thomas Fjellstrom

Member #476

June 2000

Drawing any kind of tilemap will be nothing for modern hardware and allegro5. if it is slow, you've done something wrong.

--
Thomas Fjellstrom - [website] - [email] - [Allegro Wiki] - [Allegro TODO]
"If you can't think of a better solution, don't try to make a better solution." -- weapon_S
"The less evidence we have for what we believe is certain, the more violently we defend beliefs against those who don't agree" -- https://twitter.com/neiltyson/status/592870205409353730

Raidho36

Member #14,628

October 2012

Modern hardware relies on preload to run fast, because transferring data via outdated PCI-E interface takes way longer than processing it internally. Thus, transferring data is the bottleneck, and Allegro doesn't handle it well as of 5.0.8 branch. Maybe I was missing something, but last time I checked, instead of using VBO preloading and rendering by demand, Allegro was passing vertex data to the video card via legacy functions, and it was doing it for every single bitmap blitting operation. ---- It was also handling transformation matrices by itself rather than making the video card do it.

Being serious is stupid, I'm done with it.

Thomas Fjellstrom

Member #476

June 2000

It handles it fine. All bitmaps are loaded into video memory as textures, so that doesn't need to be transferred when drawing, and if you set up deferred/held drawing, the geometry is batched up and sent in as big of a batch as possible to cut down on gpu driver calls, and transfers to the gpu. IIRC it uses vertex arrays to transfer it over, and not "legacy functions" (I assume you mean glBegin/glVertex/glEnd). held drawing is known to help performance significantly.

Using a VBO for regular allegro drawing routines isn't likely to work very well. VBO integration is something very specific to the app's own data and structure. I tried coming up with something that would defer all allegro drawing, including blits and primitives. It still wouldn't use a VBO, but even with that, it was incredibly hard, or just down right impossible to do right.

I think SiegeLord implemented deferred drawing for the primitives addon, and it didn't help performance at all.

SiegeLord

Member #7,827

October 2006

It's a fact that Allegro's drawing model is suboptimal for GPUs, even with deferred drawing tricks... a more suitable API would be more complicated and probably would be even less well received than the current API . It would require persistent objects (e.g. a persistent bitmap, rectangle etc.) and a scene graph, going completely against the A4 immediate drawing model that A5 essentially replicated.

The manual transformation bit is largely a red herring, incidentally. I've implemented a "fast" drawing library that avoids the unnecessary (for tilemaps) transformation pre-application and it wasn't magically faster... the vagaries of the GPU/driver performance drowned out any gain of that optimization. E.g. on my old GPU it had no effect (or even a paradoxical slowdown), while on my new GPU it's twice as fast.

Anyway, Allegro 5.1 has some preliminary support for VBOs in the primitives addon, so in principle somebody who wants to squeeze more performance out of their GPU could use that.

"For in much wisdom is much grief: and he that increases knowledge increases sorrow."-Ecclesiastes 1:18
[SiegeLord's Abode][Codes]:[DAllegro5]:[RustAllegro]

Thomas Fjellstrom

Member #476

June 2000

SiegeLord said:

It's a fact that Allegro's drawing model is suboptimal for GPUs, even with deferred drawing tricks...

Sure, but its still going to be more than fast enough for a lot of people's uses.

Speaking of scene graph type things, I started a 2d canvas lib. Probably going to be a lot faster than using a crap load of allegro primitive calls. But I never bothered to test it and I haven't worked on it in a while.

Quote:

Anyway, Allegro 5.1 has some preliminary support for VBOs in the primitives addon, so in principle somebody who wants to squeeze more performance out of their GPU could use that.

How would a VBO help over a vertex array when you have to fire all of the data at the gpu every time?

SiegeLord

Member #7,827

October 2006

Thomas Fjellstrom said:

How would a VBO help over a vertex array when you have to fire all of the data at the gpu every time?

I don't know the mechanism, but it does help on some GPUs. E.g. on mine it's 35% faster to use the VBO even if you're re-sending the entire thing every frame (as reported by ex_vertex_buffer).

GPUs are magic.

"For in much wisdom is much grief: and he that increases knowledge increases sorrow."-Ecclesiastes 1:18
[SiegeLord's Abode][Codes]:[DAllegro5]:[RustAllegro]

Thomas Fjellstrom

Member #476

June 2000

Or they just "forget" to optimize the vertex array stuff the same way. Driver derps.

Raidho36

Member #14,628

October 2012

Thomas, I would rather suggest uploading vertex data on bitmap loading, and then calling render of certain VBO vertex associated with certain bitmap, since batching or not, Allegro gathers vertex list and then passes it to video card, that's legacy. Instead, gathering should only involve setting up array of preloaded bitmap-associated vertex indices to render within a texture used. I'd implemented an array that would hold booleans for rendering of all bitmaps loaded, so whole gathering is narrowed down to marking certain indices to be rendered and figuring contiguous sectors to render more vertices at once. Of course that won't do since there's more than single render of certain bitmap is possible. I didn't thought it through. So on the second thought I'd implement a bunch of functions along main rendering functions specifically to handle VBO drawing, with persistent data, etc. So rather than telling video card "render this long-ass freshly gathered vertex list with this texture" it should tell a video card "have this long-ass vertex list generated on bitmap load and remember it, because I'll be asking you to render vertices X through Y with texture Z". Calling VBO render function is way faster than transferring a bunch of triangles in terms of data transmission. Texture preloading was considered an industry standard about two decades ago, so that doesn't really count. Still, I haven't looked at the code too properly, so if it's already does precisely that, then I'm sorry for putting it like this.

Thomas Fjellstrom said:

it was incredibly hard, or just down right impossible to do right

What can possibly be hard with a) uploading vertices to the video card and b) calling VBO render function later on?

Being serious is stupid, I'm done with it.

Thomas Fjellstrom

Member #476

June 2000

Raidho36 said:

Thomas, I would rather suggest uploading vertex data on bitmap loading

I'm not sure how that helps? You don't even know where you're going to be drawing it at that point, or how many times.

Quote:

What can possibly be hard with a) uploading vertices to the video card and b) calling VBO render function later on?

Lets say you have 1000 objects, that have to be rendered in a certain order, how are you going to tell the gpu to render things in the right order? We're talking 2d here, but where objects have a proper zindex.

Raidho36

Member #14,628

October 2012

Thomas Fjellstrom said:

I'm not sure how that helps? You don't even know where you're going to be drawing it at that point, or how many times.

Neither do 3d games know at what position and how many times will they draw their 3d models, but that doesn't stop anyone from using VBO. You must be don't getting the principle and idea of VBO if you have this kind of questions raised by it. Anyway, as for using VBO specifically for 2d bitmaps, here's my idea of it: when you load a bitmap (or create sub-bitmap) you upload to video card it's 4 vertices positions against it's origin (normally center) and with texture coordinates also specified, which would be 0 and 1 for "full" bitmap and somewhere between those for sub-bitmap. And there you go - you're all set to render your preloaded bitmaps with VBO. Simply call the render function to render vertices 1-4 if you want to render your bitmap 1, 5-8 for bitmap 2, etc. The right texture should be enabled, of course. So existing function that gathers as many bitmaps with same texture in a row as possible would be handy. Yet again, it may be faster to brute-force render it with setting up new texture every time it needs to be changed rather than computing queues like that, so it should be an option enabled with a flag. Or preferably, Allegro should estimate by itself whether or not target machine needs software queue or it can do it hardware just fine.

Quote:

Lets say you have 1000 objects, that have to be rendered in a certain order, how are you going to tell the gpu to render things in the right order?

First obvious thought: orthographic and depth buffer. Thoughtful idea: like normal, with depth buffer disabled, but rather than passing new vertices over and over, simply ask video card to render already uploaded vertices.

Being serious is stupid, I'm done with it.

Thomas Fjellstrom

Member #476

June 2000

Raidho36 said:

You must be don't getting the principle and idea of VBO if you have this kind of questions raised by it.

Or you completely missed my point. You upload vertex data for N bitmaps. Doesn't really do much when you're drawing N*500. The real gain comes when you can put all of your geometry in the VBO.

Quote:

Please go implement that and let me know how it goes. Start with the Allegro 5 primitive apis (the bitmap drawing, and primitive shape drawing apis), and make sure everything is drawing in the correct order based on the order of the calls, making sure to take into account blending.

And please note, people are going to be calling these functions every frame.

Raidho36

Member #14,628

October 2012

Thomas Fjellstrom said:

Doesn't really do much when you're drawing N*500.

Implying that building an array of N * 500 * 4 vertices every frame and passing it to video card would be anywhat more effecient. Even if not rendered faster, it saves CPU time. Yes, uploading entire drawable thing would be better, but that's not quite possible, and not used in practice: even 3d games with large open spaces split up their maps into small chunks and render them as needed rather than render entire thing at once, so your argument (rendering the whole thing as a single mesh) is not really valid anyway.

Quote:

Please go implement that and let me know how it goes.

Let me check my list. Data structures, UTF-32 internal strings (what kind of moron came up with idea to store unicode strings internally as UTF-8?), network... now I just list in the "VBO" and that'll be it for now. Also sounded to me like an excuse not to do it.

----

Note that the whole point of using VBO is to cut down data transmission rate and CPU computation overhead (also saves a tiny bit of RAM) so that video card doesn't wait while you prepare your data, as this allows model rendering be as simple as calling a single function that unfolds into small bunch (possibly just one) of low-level calls to video card.

Being serious is stupid, I'm done with it.

Vanneto

Member #8,643

May 2007

Nobody needs excuses not to do something for Allegro... Its all voluntary. Just like you could stop talking so much and actually get to coding some of the features you think are lacking.

In capitalist America bank robs you.

Edgar Reynaldo

Major Reynaldo

May 2007

A5 is still a 2D library. Why do we need Allegro to be a 3D library? Isn't that what Ogre is for? And as for passing vertexes, you can do that yourself with OpenGL. Allegro doesn't prevent you from doing anything as far as I know. I don't see what the complaints are all about. If you have such great ideas to optimize the allegro library, by all means do so, but stop expecting everyone else to implement your ideas as if they didn't have anything else to do with their time. Stop blabbing, and start coding.

Eagle and Allegro 5 binaries | Older Allegro 4 and 5 binaries | Allegro 5 compile guide
King Piccolo will make it so Then I could just flip it with some sweet, sweet matrix math. In either case, I’ll ride this wagon until the wheels fall off. Thanks to those that keep it running