Allegro 5: Maximizing and optimizing deferred rendering

Nicol Bolas

I was looking at the Allegro 5 documentation and I came across the section on deferred drawing.

This is a very useful optimization. By keeping to a single "texture", it allows you to minimize the number of state changes the renderer does. However, there are 2 ways to improve this.

The first way is to expose array textures to the user in some way. For those not knowing, an array texture is really just a 3D texture, except that filtering between depths in the texture is not allowed. This means that you can have a number of individual 2D textures that you don't need to change texture state for. This increases the gain from deferred textures. Indeed, you may only ever have one or two actual "textures", and just pull from a couple of array textures.

I'm not sure how to do this from an API standpoint. Perhaps some kind of ALLEGRO_LAYERED_BITMAP, which itself is not an ALLEGRO_BITMAP. But the user can request a layer from it, which [i]is[/i] an ALLEGRO_BITMAP. There would of course be appropriate construction and destruction functions. All of the usual operations would work as expected on the ALLEGRO_BITMAP members. All of these ALLEGRO_BITMAPs would be considered to have the same parent, and thus be eligible for deferred drawing.

The other optimization is to make deferred rendering the default. The main issue with it presently is that it isn't automatic. There is no reason why the Allegro graphics driver cannot detect when the user has changed some state and then send the previously accumulated rendering commands.

Of course, like any good optimization, this would significantly complicate the Allegro graphics drivers. But it would be to the benefit of the user. First, because the optimization would be invisible: you don't have to use special API calls to get them. Second, because the optimization could be improved beyond what the API calls require.

For example, a recent thread on this issue has relaxed the state changing requirement, allowing the blend color to be changed. Well, this requires modifying applications to use, as before this improvement, changing any of the blending state resulted in undefined behavior. If deferred rendering were more automatic, then there would be no need for such modifications; the application would simply get faster.

Deferred rendering is an optimization. Gaining access to it should require as little API work as possible.

I'm obviously not an Allegro 5 engineer. So I am not aware of what the internal issues are with regard to making deferred rendering automatic. But I do think it would be worth the effort.

Thomas Fjellstrom

Nicol Bolas said:

There is no reason why the Allegro graphics driver cannot detect when the user has changed some state and then send the previously accumulated rendering commands.

From what I gather, checking current state is just as slow as setting it, so you'd actually lose a fair bit of performance by checking the current GL state all over the place.

I think the main problem with making Deferred rendering more automatic is that its just too hard, and probably very error prone. You'd have to force everyone to use allegro versions of GL functions to change state, and store the state changes in order with graphics calls. I'm not sure thats worth the effort. It might be if theres some nice way to make the api work the same way across OpenGL and D3D, but thats not always possible. and it would take a heck of a lot of work.

I think one of the next optimizations that Allegro might get is the "combined bitmaps" feature someone brought up ages ago. Basically you tell allegro to take a large number of ALLEGRO_BITMAPs and pack them all into a single larger texture, and modify the original bitmaps to access the single larger texture. When used with deferred drawing it should make things a lot faster.

Nicol Bolas

The state getting issue is a non-issue, so long as you make the assumption that the user is using Allegro calls for their rendering. Thus Allegro is aware whenever state changes.

Are there a lot of cases where users are changing OpenGL state behind Allegro's back and expecting Allegro to work in some way? I do not think this is a good way to do things. If a user is relying on Allegro rendering things in a certain way, and changing OpenGL state to change how this happens predictably, then internal changes to Allegro's rendering would dramatically affect this.

For example, let's say that Allegro's OpenGL renderer currently draws using the OpenGL fixed-function pipeline. A user can bind an OpenGL fragment program to override this pipeline. However, if Allegro's internals change to instead use a fragment program itself to do certain rendering, then the user's program will be overridden by Allegro's program.

I do not think it is a good idea to encourage or rely upon this usage pattern. If this usage pattern is needed, to allow Allegro to do some things and normal GL/D3D rendering to do others, then there should be an API call. Something like "al_fix_allegro_state." This would tell the driver that the user has poked at the underlying GL/D3D state, and that it should now reassert its own internal copies of that state.

I don't think allowing the user to modify state behind Allegro's back and expecting everything to work just fine is a good idea.

The "combined bitmaps" API sounds much simpler and more effective than the one I was thinking of. One problem with this is that you can't do it at load-time. You have to do it at runtime, and after you have created your BITMAPs. Perhaps a better way would be to create a "bitmap allocator" object. You just ask it for ALLEGRO_BITMAPs, and it will ensure that they all come from the same parent, allocating chunks from the bitmap as needed. If it runs out, it can return NULL, allowing you to know when you ran out of texture.

Trent Gamblin

Nicol Bolas said:

The state getting issue is a non-issue, so long as you make the assumption that the user is using Allegro calls for their rendering. Thus Allegro is aware whenever state changes.

That's not a safe assumption at all.

Todd Cope

You could make this assumption if you required the user to call some function before beginning to use OpenGL/D3D calls and another when they are done.

Elias

Or alternatively, document which state is changed by A5 functions (and which state affects them). Then users can write their own functions to store/restore the OpenGL state they need to. What it means for A5 though is, we could then expect no state changes between our function calls - making things as fast as possible for the case when users do not use any OpenGL calls of their own.

Peter Wang

Elias said:

Or alternatively, document which state is changed by A5 functions (and which state affects them).

This would be useful information. However, we don't want to tie the API to the implementation. The current situation is not much good either. I think it would be safer to introduce explicit begin/end functions.

Nicol Bolas

I am not sure that an explicit "end" function is needed. The user knows when he is changing underlying GL state. And the user knows that, because he is doing so, he can no longer rely on Allegro functions. So all the user needs to do is call one function before calling any Allegro functions, and not call any Allegro functions after changing underlying state unless they call the starting function again.

I suppose one thing that an explicit end function would allow is for Allegro to turn rendering entirely off. So that attempts to render outside of the bracketed code would fail, and thus not produce undefined results.

SiegeLord

Nicol Bolas said:

By keeping to a single "texture", it allows you to minimize the number of state changes the renderer does.

Not quite. The speedup of the deferred rendering is that it allows you to send a lot of vertices in a single batch, instead of having to send 4/6 at a single time. The state changes were already basically minimized before the deferred rendering was implemented.

Quote:

I was planning on supporting this via texture atlasing. It's still not implemented, but it is up there on my list of things to do. The propsed API was this:

#SelectExpand
  1ALLEGRO_ATLAS* al_create_atlas(int size);
  2
  3/*
  4Tries to add the src bitmap to the atlas, returning a pointer to the 
  5new bitmap upon success, or null upon failure. The old bitmap is unaltered,
  6and should be destroyed by the user as appropriate.
  7*/
  8ALLEGRO_BITMAP* al_add_bitmap_to_atlas(ALLEGRO_ATLAS* atlas, ALLEGRO_BITMAP* src);
  9
 10/*
 11Removes all of the internal structures used by the atlas, so it doesn't take up lots of useless space
 12*/
 13void al_finalize_atlas(ALLEGRO_ATLAS* atlas);
 14
 15void al_destroy_atlas(ALLEGRO_ATLAS* atlas);

Todd Cope

If you are going to make atlasing functions you will need to have some sort of mode parameter. If the bitmaps are right next to each other in the atlas and the user has filtering enabled the adjacent pixels will be partially sampled during rendering. For sprites you will need to have a pixel of padding between the bitmaps. Tiles for tilemaps are a bit tricker. In order to make them look right you will have to extend the pixels around the edges and the corners, otherwise you will get seams and glitches in the rendering. That is all assuming you want to support the use of filtering.

SiegeLord

I was planning on always having a clamped 1 pixel border for the images. Space efficiency aside, would that ever be unsatisfactory?

Todd Cope

SiegeLord said:

Space efficiency aside, would that ever be unsatisfactory?

By clamped do you mean extending the edge pixels out in each direction? If so a 1 pixel border will work fine. For example, say you have a 32x32 tile. The tile will take up a space of 34x34 in the atlas.

Sprites look best when you don't extend the pixels out and instead leave a blank (0 alpha) border. You should give the user both options if possible. Something like this should do:

/* flags can be ALLEGRO_ATLAS_CLAMP, 0 means no extending texture data */
ALLEGRO_BITMAP* al_add_bitmap_to_atlas(ALLEGRO_ATLAS* atlas, ALLEGRO_BITMAP* src, int flags);

Bob

Atlasing is more complicated than that: How do you deal with mipmaps (if you use them)? How do you deal with aniso, which with 16x AF can sample up to 17 texels away from the edge of an image?

GullRaDriel

We don't deal with anything regarding graphics. We just wait for Bob to come and fix the bugs

SiegeLord

Bob said:

How do you deal with mipmaps (if you use them)? How do you deal with aniso, which with 16x AF can sample up to 17 texels away from the edge of an image?

We don't. A4.9 does not support mipmaps to my knowledge, and anisotropic mapping is only of use for 3D stuff, which is not the domain of A4.9.

Nicol Bolas

SiegeLord said:

I was planning on supporting this via texture atlasing [en.wikipedia.org]. It's still not implemented, but it is up there on my list of things to do. The propsed API was this:

This requires that the user has loaded the image as an ALLEGRO_BITMAP already. This is not the most efficient way to go about this. It would be better to have this:

ALLEGRO_BITMAP* al_create_bitmap_from_atlas(ALLEGRO_ATLAS* atlas, int w, int h);

You could still have the other API call, but this one avoids a lot of texture copying.

Bob

Isn't that just subbitmaps?

SiegeLord

Nicol Bolas said:

This requires that the user has loaded the image as an ALLEGRO_BITMAP already.

99% of the time this is the case because the user loaded the bitmap from disk. And also:

Bob said:

Isn't that just subbitmaps?

Yep.

EDIT: Well, I suppose it's not entirely the case. The operation is identical to creating a sub-bitmap, it's just that you let the program choose where in the source bitmap the sub-bitmap is located. I suppose this function wouldn't hurt, although I am not seeing it being used a lot.

Another problem I just thought of is how to handle the extended borders and user editing the bitmaps. If the user adds a bitmap to the atlas, and then edits it... the extended border might no longer match the bitmap. Not too sure how to fix this yet.

BigBertus

@SiegeLord:
Well, I'd love to see the atlasing feature!

SiegeLord said:

ALLEGRO_ATLAS* al_create_atlas(int size);

Just curious... What does "size" mean here?

Do you want to pack the single bitmaps efficiently? How?
How does the user know which size to pick for all his bitmaps to fit?

SiegeLord said:

Hmm... Either don't support that case or add some function to refresh the extended border for the individual bitmap...?
I'd say if someone cares so much about performance that he puts his bitmaps into an atlas, he probably won't edit much of them anyway...

SiegeLord

Florian Bueren said:

Just curious... What does "size" mean here?

Do you want to pack the single bitmaps efficiently? How?
How does the user know which size to pick for all his bitmaps to fit?

An atlas will just be a giant square bitmap. Size specifies the side length of it. The user will be advised to pick some large size (I am sure this part will be very use dependent, so he'll be encouraged to experiment too ). In terms of 'how', I am looking at some algorithms I found on the Internet. Most simply partition the atlas into a tree structure and search the leaves for where to put the bitmap.

Thread #602408. Printed from Allegro.cc