Speeding up drawing from a tile atlas.

Dakota West

Uusually when I have a question I can find an answer in the docs or on the forums because someone asked it before me, but now I'm kind of stumped. I've been browsing the forums looking for solutions for a while, and I'm nearly positive the error is in my code and not the system because a few months I implemented something similar that had no performance issues at all using the same version of allegro (5.0.8).

I'm drawing to the screen for a tile atlas, organized into an array of sub-bitmaps. It is extremely slow. Like, 15/16fps slow on windows and 26fps slow on Arch. This is on a machine that can run Battlefield 3 on Ultra at 60fps just to put in perspective exactly how wrong I went somewhere in my code.

I draw to the screen with this block

#SelectExpand
  1al_hold_bitmap_drawing(true);
  2for (int i = x; i < x + tiwi; i++)
  3        for (int j = y; j < y + tihi; j++) {
  4                fg = ch->fore[i][j];
  5                bg = ch->back[i][j];
  6                block_draw(fg, bg, (i - x) * scale,
  7                           (j - y) * scale, scale);
  8        }
  9al_hold_bitmap_drawing(false);

x and y indicate position on the map, and tihi abbreviates "tiles high" (screen size) and tiwi abbreviates "tiles wide". fg and bg are numbers, an id for what tile to draw, and then scale is the size to draw them at.

The function that calls is below.

#SelectExpand
  1void block_draw (block fg, block bg, int x, int y, int scale)
  2{
      if ((profiles[fg.id].opacity < BLOCK_OPAQUE) && bg.id) {
              al_draw_scaled_bitmap(backs[bg.id], 0, 0, BLOCK_SIZE,
                                    BLOCK_SIZE, x, y, scale, scale, 0);
              if (fg.id)
                      al_draw_tinted_scaled_bitmap(sprites[fg.id],
                              al_map_rgba_f(1, 1, 1, profiles[fg.id].opacity),
                              0, 0, BLOCK_SIZE, BLOCK_SIZE, x, y, scale,
                              scale, 0);
      } else
              al_draw_scaled_bitmap(sprites[fg.id], 0, 0, BLOCK_SIZE,
                                    BLOCK_SIZE, x, y, scale, scale, 0);
 14}

Currently all the blocks I've been testing with go straight to the "else" clause in block_draw.

As for modes, I took some suggestions I found on the forums and implemented flags that should help.

#SelectExpand
  1int init ()
  2{
  3        if(!al_init())
  4                return 1;
  5
  6        al_init_image_addon();
  7        al_install_keyboard();
  8        al_init_font_addon();
  9        al_init_ttf_addon();
 10        al_set_new_display_flags(ALLEGRO_OPENGL);
 11        al_set_new_bitmap_flags(ALLEGRO_VIDEO_BITMAP);
 12
 13        if (enet_initialize())
 14                return 2;
 15
 16        atexit(enet_deinitialize);
 17
 18        return 0;
 19}

Without the ALLEGRO_OPENGL flag I was getting about a frame a minute, so that's really helpful.

The bitmaps called in block_draw are loaded with this.

#SelectExpand
  1static ALLEGRO_BITMAP *sheet;
  2static ALLEGRO_BITMAP *sprites[BLOCK_COUNT];
  3static ALLEGRO_BITMAP *backs[BLOCK_COUNT];
  4
  5...
  6
  7if (!(sheet = al_load_bitmap(name)))
  8        return 1;
  9
 10ALLEGRO_BITMAP *temp = al_get_target_bitmap();
 11ALLEGRO_BITMAP *map = al_create_bitmap(BLOCK_COUNT * BLOCK_SIZE,
 12                                       BLOCK_SIZE * 2);
 13al_set_target_bitmap(map);
 14al_draw_bitmap(sheet, 0, 0, 0);
 15al_draw_tinted_bitmap(sheet, al_map_rgb(100, 100, 100), 0, BLOCK_SIZE, 0);
 16
 17if (temp)
 18        al_set_target_bitmap(temp);
 19
 20al_destroy_bitmap(sheet);
 21sheet = map;
 22
 23memset(sprites, 0, sizeof(ALLEGRO_BITMAP*) * BLOCK_COUNT);
 24for (int i = 0; i < BLOCK_COUNT; i++)
 25        sprites[i] = al_create_sub_bitmap(sheet, i * BLOCK_SIZE, 0,
 26                                          BLOCK_SIZE, BLOCK_SIZE);
 27
 28memset(backs, 0, sizeof(ALLEGRO_BITMAP*) * BLOCK_COUNT);
 29for (int i = 0; i < BLOCK_COUNT; i++)
 30        backs[i] = al_create_sub_bitmap(sheet, i * BLOCK_SIZE,
 31                                        BLOCK_SIZE,
 32                                        BLOCK_SIZE, BLOCK_SIZE);

I am going to continue testing things I'm reading in docs/forums, and I will post back on the thread if I find anything in case other people have this problem in the future.

Jeff Bernard

Do you have the same performance problem if you use al_draw_bitmap_region on the atlas instead of creating sub bitmaps? (You can use al_scale_transform/al_use_transform if it's crucial you have some non-one scale.)

Dakota West

No, using the draw region doesn't change performance in any measurable way. I will test the transformation method. Since all the tiles drawn in a given frame would be scaled to the same way, I figure I set it for the target bitmap and then unset it once everything is drawn?

ph03nix

I find this suspicious:

void block_draw (block fg, block bg, int x, int y, int scale){

It seems you are passing the block objects by value instead of reference or as a pointer, meaning you are making copies of the block objects. If you do this hundreds of times per iteration, it could cause your program to slow down (and even malfunction)

Dakota West

ph03nix said:

The block struct currently only has one field, which is an unsigned 8bit integer, so I'm not sure that's the issue, but I'm running a test now.

Damn, no difference.

ph03nix

Quite the mystery then, because I see nothing wrong with the code. Does it still run slowly if you draw everything but the tiles? (assuming you draw other things as well)

Raidho36

~~To me it seems like biggest reason for this thing to work slow.~~ Use pointers anyway.

Also, make sure you don't call your render functions too much. ~~Do not assume that you can simply render entire scene at once just because it will get clipped to display area.~~ That was already there, haven't noticed, sorry. Anyway, even something as simple as rendering 64x64 bitmaps grid @60 fps causes big time performance problems.

Dakota West

The bitmaps are 64x64 pixels naturally, but I want their render size to be adjustable. Is it still hard to render them even when resized because their natural resolution is 64x64? For context, it renders 30 tiles across, and 10 tiles down (although I hoped that when I figured this out, I could render a bigger space).

I will experiment with smaller textures.

And ph03nix no, I am only drawing the tiles. It's only a map viewer.

ph03nix

Rendering 300 64x64 bitmaps should not slow down to 15 fps from 60, or be slow at all. Perhaps you are somehow converting the bitmaps to memory bitmaps. Try calling al_get_bitmap_flags on your tiles to see if they are memory bitmaps

Dakota West

@ph03nix You got it. They are somehow becoming memory bitmaps. I don't know how though. I will update soon.

Update: The tile atlas was loaded after setting the new bitmap flags to ALLEGRO_VIDEO_BITMAP, but before creating the display. I missed something really obvious; sorry for wasting your time guys. I can now display them at their natural size, in an array of 100x200 no problem, as well as export to a 500x500 tile image in under a second.

Raidho36

I meant a grid of bitmaps sized 64x64, of arbitrary bitmaps, that is rendering over 4000 bitmaps at once. Even such relatively small tile grid turns out to be a huge deal. That was real surprise to me, since my computer ran modern games at "ultra" just fine, I was really expected it to much the whole thing like it's nothing. And that's with very plain loop, all there was is incrementing values and drawing "as is", with no computations going along. Now that I think of it, I probably had memory bitmaps, too. Still, if you have a lot of small tiles you're more likely to be drawing too many of them, in terms of computations per tile. For that part I really suggest you to use transformations, as you do a lot of computing within the loop only to do what transformaton does. Even if your video card render it faster than you can supply it, that'll still save you some processing.

Thomas Fjellstrom

Drawing any kind of tilemap will be nothing for modern hardware and allegro5. if it is slow, you've done something wrong.

Raidho36

Modern hardware relies on preload to run fast, because transferring data via outdated PCI-E interface takes way longer than processing it internally. Thus, transferring data is the bottleneck, and Allegro doesn't handle it well as of 5.0.8 branch. Maybe I was missing something, but last time I checked, instead of using VBO preloading and rendering by demand, Allegro was passing vertex data to the video card via legacy functions, and it was doing it for every single bitmap blitting operation. ---- It was also handling transformation matrices by itself rather than making the video card do it.

Thomas Fjellstrom

It handles it fine. All bitmaps are loaded into video memory as textures, so that doesn't need to be transferred when drawing, and if you set up deferred/held drawing, the geometry is batched up and sent in as big of a batch as possible to cut down on gpu driver calls, and transfers to the gpu. IIRC it uses vertex arrays to transfer it over, and not "legacy functions" (I assume you mean glBegin/glVertex/glEnd). held drawing is known to help performance significantly.

Using a VBO for regular allegro drawing routines isn't likely to work very well. VBO integration is something very specific to the app's own data and structure. I tried coming up with something that would defer all allegro drawing, including blits and primitives. It still wouldn't use a VBO, but even with that, it was incredibly hard, or just down right impossible to do right.

I think SiegeLord implemented deferred drawing for the primitives addon, and it didn't help performance at all.

SiegeLord

It's a fact that Allegro's drawing model is suboptimal for GPUs, even with deferred drawing tricks... a more suitable API would be more complicated and probably would be even less well received than the current API . It would require persistent objects (e.g. a persistent bitmap, rectangle etc.) and a scene graph, going completely against the A4 immediate drawing model that A5 essentially replicated.

The manual transformation bit is largely a red herring, incidentally. I've implemented a "fast" drawing library that avoids the unnecessary (for tilemaps) transformation pre-application and it wasn't magically faster... the vagaries of the GPU/driver performance drowned out any gain of that optimization. E.g. on my old GPU it had no effect (or even a paradoxical slowdown), while on my new GPU it's twice as fast.

Anyway, Allegro 5.1 has some preliminary support for VBOs in the primitives addon, so in principle somebody who wants to squeeze more performance out of their GPU could use that.

Thomas Fjellstrom

SiegeLord said:

It's a fact that Allegro's drawing model is suboptimal for GPUs, even with deferred drawing tricks...

Sure, but its still going to be more than fast enough for a lot of people's uses.

Speaking of scene graph type things, I started a 2d canvas lib. Probably going to be a lot faster than using a crap load of allegro primitive calls. But I never bothered to test it and I haven't worked on it in a while.

Quote:

Anyway, Allegro 5.1 has some preliminary support for VBOs in the primitives addon, so in principle somebody who wants to squeeze more performance out of their GPU could use that.

How would a VBO help over a vertex array when you have to fire all of the data at the gpu every time?

SiegeLord

Thomas Fjellstrom said:

How would a VBO help over a vertex array when you have to fire all of the data at the gpu every time?

I don't know the mechanism, but it does help on some GPUs. E.g. on mine it's 35% faster to use the VBO even if you're re-sending the entire thing every frame (as reported by ex_vertex_buffer).

GPUs are magic.

Thomas Fjellstrom

Or they just "forget" to optimize the vertex array stuff the same way. Driver derps.

Raidho36

Thomas, I would rather suggest uploading vertex data on bitmap loading, and then calling render of certain VBO vertex associated with certain bitmap, since batching or not, Allegro gathers vertex list and then passes it to video card, that's legacy. Instead, gathering should only involve setting up array of preloaded bitmap-associated vertex indices to render within a texture used. I'd implemented an array that would hold booleans for rendering of all bitmaps loaded, so whole gathering is narrowed down to marking certain indices to be rendered and figuring contiguous sectors to render more vertices at once. Of course that won't do since there's more than single render of certain bitmap is possible. I didn't thought it through. So on the second thought I'd implement a bunch of functions along main rendering functions specifically to handle VBO drawing, with persistent data, etc. So rather than telling video card "render this long-ass freshly gathered vertex list with this texture" it should tell a video card "have this long-ass vertex list generated on bitmap load and remember it, because I'll be asking you to render vertices X through Y with texture Z". Calling VBO render function is way faster than transferring a bunch of triangles in terms of data transmission. Texture preloading was considered an industry standard about two decades ago, so that doesn't really count. Still, I haven't looked at the code too properly, so if it's already does precisely that, then I'm sorry for putting it like this.

Thomas Fjellstrom said:

it was incredibly hard, or just down right impossible to do right

What can possibly be hard with a) uploading vertices to the video card and b) calling VBO render function later on?

Thomas Fjellstrom

Raidho36 said:

Thomas, I would rather suggest uploading vertex data on bitmap loading

I'm not sure how that helps? You don't even know where you're going to be drawing it at that point, or how many times.

Quote:

What can possibly be hard with a) uploading vertices to the video card and b) calling VBO render function later on?

Lets say you have 1000 objects, that have to be rendered in a certain order, how are you going to tell the gpu to render things in the right order? We're talking 2d here, but where objects have a proper zindex.

Raidho36

Thomas Fjellstrom said:

I'm not sure how that helps? You don't even know where you're going to be drawing it at that point, or how many times.

Neither do 3d games know at what position and how many times will they draw their 3d models, but that doesn't stop anyone from using VBO. You must be don't getting the principle and idea of VBO if you have this kind of questions raised by it. Anyway, as for using VBO specifically for 2d bitmaps, here's my idea of it: when you load a bitmap (or create sub-bitmap) you upload to video card it's 4 vertices positions against it's origin (normally center) and with texture coordinates also specified, which would be 0 and 1 for "full" bitmap and somewhere between those for sub-bitmap. And there you go - you're all set to render your preloaded bitmaps with VBO. Simply call the render function to render vertices 1-4 if you want to render your bitmap 1, 5-8 for bitmap 2, etc. The right texture should be enabled, of course. So existing function that gathers as many bitmaps with same texture in a row as possible would be handy. Yet again, it may be faster to brute-force render it with setting up new texture every time it needs to be changed rather than computing queues like that, so it should be an option enabled with a flag. Or preferably, Allegro should estimate by itself whether or not target machine needs software queue or it can do it hardware just fine.

Quote:

Lets say you have 1000 objects, that have to be rendered in a certain order, how are you going to tell the gpu to render things in the right order?

First obvious thought: orthographic and depth buffer. Thoughtful idea: like normal, with depth buffer disabled, but rather than passing new vertices over and over, simply ask video card to render already uploaded vertices.

Thomas Fjellstrom

Raidho36 said:

You must be don't getting the principle and idea of VBO if you have this kind of questions raised by it.

Or you completely missed my point. You upload vertex data for N bitmaps. Doesn't really do much when you're drawing N*500. The real gain comes when you can put all of your geometry in the VBO.

Quote:

Please go implement that and let me know how it goes. Start with the Allegro 5 primitive apis (the bitmap drawing, and primitive shape drawing apis), and make sure everything is drawing in the correct order based on the order of the calls, making sure to take into account blending.

And please note, people are going to be calling these functions every frame.

Raidho36

Thomas Fjellstrom said:

Doesn't really do much when you're drawing N*500.

Implying that building an array of N * 500 * 4 vertices every frame and passing it to video card would be anywhat more effecient. Even if not rendered faster, it saves CPU time. Yes, uploading entire drawable thing would be better, but that's not quite possible, and not used in practice: even 3d games with large open spaces split up their maps into small chunks and render them as needed rather than render entire thing at once, so your argument (rendering the whole thing as a single mesh) is not really valid anyway.

Quote:

Please go implement that and let me know how it goes.

Let me check my list. Data structures, UTF-32 internal strings (what kind of moron came up with idea to store unicode strings internally as UTF-8?), network... now I just list in the "VBO" and that'll be it for now. Also sounded to me like an excuse not to do it.

----

Note that the whole point of using VBO is to cut down data transmission rate and CPU computation overhead (also saves a tiny bit of RAM) so that video card doesn't wait while you prepare your data, as this allows model rendering be as simple as calling a single function that unfolds into small bunch (possibly just one) of low-level calls to video card.

Vanneto

Nobody needs excuses not to do something for Allegro... Its all voluntary. Just like you could stop talking so much and actually get to coding some of the features you think are lacking.

Edgar Reynaldo

A5 is still a 2D library. Why do we need Allegro to be a 3D library? Isn't that what Ogre is for? And as for passing vertexes, you can do that yourself with OpenGL. Allegro doesn't prevent you from doing anything as far as I know. I don't see what the complaints are all about. If you have such great ideas to optimize the allegro library, by all means do so, but stop expecting everyone else to implement your ideas as if they didn't have anything else to do with their time. Stop blabbing, and start coding.

SiegeLord

Y'all really should stop replying to Raidho36.

Arthur Kalliokoski

Edgar Reynaldo said:

A5 is still a 2D library. Why do we need Allegro to be a 3D library? Isn't that what Ogre is for?

A5 works just fine.

Thomas Fjellstrom

Theres no reason we should put artificial limitations in place. If its easy to allow people to make 3D stuff with allegro, we should do it.

Of course allegro will never have its own full 3D engine api, but it shouldn't get in your way when you're trying to make one.

Raidho36

Vanneto said:

Just like you could stop talking so much and actually get to coding some of the features you think are lacking.

That ain't my style, I'll just complain that those are missing rather than make them cease missing.

Edgar Reynaldo said:

A5 is still a 2D library. Why do we need Allegro to be a 3D library?

True. Yet doesn't mean you shouldn't employ 3d-originated technques, provided modern video card have little to no support for 2d acceleration (nowdays the whole thing is done via 3d). Also, Allegro's sound doesn't have 3d-originated features such as doppler effect (despite using OpenAL), which is obviously a lacking of features. ---- Though it does have "speed" modifier that's kinda makes up for doppler stuff, if you can calculate it right. Also panning have to be done manually. But, I mean, OpenAL have those built-in, why make users implement it on their own on top of Allegro's wrappers.

Quote:

Isn't that what Ogre is for?

IIRC Ogre is written in C++, which I don't stand as a library writing language. For any language but assembly it should be written in C unless it targets that particular language of choosing.

Quote:

I don't see what the complaints are all about.

The library isn't optimal enough and doesn't provides certain features it could've had provided easily, that's what. Just because it's "good enough" as is doesn't mean you shouldn't put further effort to make it better, particularry by coming up with reasons why not. Come to think of it, as opposed to vaguely (mostly zero) productive discussions, that would be even counter-productive. I perfectly realize that fixing bugs is higher priority though, just conduct some discussion. Like, never said you should go do it, it's not my fault you turned it that way.

Thomas Fjellstrom said:

Of course allegro will never have its own full 3D engine api, but it shouldn't get in your way when you're trying to make one

What about OpenAL part? I haven't looked at it that closely.

Thomas Fjellstrom

OpenAL isn't related to 3D video at all.

Raidho36 said:

Just because it's "good enough" as is doesn't mean you shouldn't put further effort to make it better, particularry by coming up with reasons why not.

People do what they are interested in, in their own free time. If someone wants something specific, they should just do it, instead of winging about it not having been done for them.

Raidho36

Thomas Fjellstrom said:

OpenAL isn't related to 3D video at all.

Because it's related to 3d audio? Seriously though, I see zero reasons for Allegro's sound to limit users' interaction with OpenAL to very basic features. Not like it's deliberately blocked by it, but unlike OpenGL part, it makes it either Audio addon or bare OpenAL, as far as I see.

Quote:

People do what they are interested in, in their own free time

Yeah, I know, see above. I sincerely would go patch something up if it wasn't for need to install shitloads of libraries and programs and have to spend hours trying to get it to work if I happen to be unlucky with system setup just to build liballeg from sources (because, you know, things aren't done easy with Windows), and then spend days to figure right way to integrate new sources into library, so right now I'm just doing my own stuff, like mirror interreflections for 3d laser game, (for some reason perspective gets screwed up for certain models after a bunch of iterations although matrices seems to be OK).

Thomas Fjellstrom

There was going to be a much fancier audio api, but it never got finished.

I'm not really sure how the OpenAL api works, so I don't know if it limits you from using it if you also use Allegro.

Raidho36

As I said, unlike OpenGL part, it doesn't supply you with functions that let you use Allegro's internal sound-related structures with bare OpenAL.Like, OpenGL part could return you a valid OpenGL texture from Allegro bitmap, but Audio addon doesn't have any of such.

Elias

It's not needed with audio, you can just completely bypass Allegro's audio. (With OpenGL some connection was required because OpenGL can't create the window.)

Thread #612620. Printed from Allegro.cc