Improving drawing speed

Aikei_c

I have a very slow drawing right now, if I draw more than 1000 sprites, fps falls to 30-40.
My sprites are either 96x96 or 128x128. Each of them is in a different file and in a different ALLEGRO_BITMAP.
I found here a solution - to use atlasses, and I decided to create an atlas for each of my charactes. However, an atlas for just one character (with all states and directions etc.) gets extremely big and grows to 1,5 MB on disk and about 50 MB (!) in memory. If I ever need to draw just 10 different characters at the same time, I would need to load 10 of these big atlasses which would take 500 MB of memory, isn't it too much?
Is there a more efficient way to use atlasses?

Thomas Fjellstrom

Aikei_c said:

If I ever need to draw just 10 different characters at the same time, I would need to load 10 of these big atlasses which would take 500 MB of memory, isn't it too much?

Whether they are loaded as separate bitmaps or one large bitmap, they take up the same amount of memory.

If you really don't want to load all of that at once, maybe split out the types of frames into separate atlases. Say if one section of your game doesn't use half of the different frames, stick them in separate atlases.

Also, 50MB isn't all that much, when you consider my laptop has 2GB vram.

Aikei_c

Thanks for the fast response.
Well, if I load them as separate bitmaps, I can dispose of loaded bitmaps on the fly, if the game finds out that some of them are not used and I'm running out of memory. But I can't dispose of the whole atlass with a charater if at least some images of this charater are to be drawn.

I'll consider what you said about more efficient breaking down of images into atlasses.

However, if there are not 10 different characters on screen, but 50? 100?

BTW. Does windows task manager show both video ram in usage and "ordinary" ram? I can't seem to find out if they are summed up or broken down...

Thomas Fjellstrom

Aikei_c said:

BTW. Does windows task manager show both video ram in usage and "ordinary" ram? I can't seem to find out if they are summed up or broken down...

Not with discrete video cards that have dedicated video memory.

Quote:

However, if there are not 10 different characters on screen, but 50? 100?

If you want to get really extreme, try and make a dynamic resource manager that automatically, and dynamically updates a set of atlases based on what images are in use. So maybe you store them separately, but as you load them, they are added to an atlas at run time. When at atlas gets full, you can either look at freeing some older not often used images, or creating a new atlas. For that, maybe use a LRU list to manage what gets to stay and what gets kicked out.

Of course thats a bit complicated. You could get away with grouping your sprites into categories which get put into separate atlases.

Aikei_c

I probably have another question then: why does windows task manager show that some memory in use for the bitmap at all, if all of it should be loaded into video memory? I double checked that my bitmaps are loaded as video bitmaps, not memory ones.

Thomas Fjellstrom

Ah, allegro by default keeps a copy of them in system memory because DirectX can and will at any moment, completely shit itself and lose the contents of every texture you loaded. You can turn that off though, but then you have to manually re-upload your bitmaps when notified that the display has been restored.

Aikei_c

Ok, thanks for clarifying this.

Kris Asick

That is one of the unfortunate things about working with hardware acceleration is you can only load so much uncompressed data into video memory at a time. Hardware acceleration was designed around 3D graphics, so 3D meshes would form the shape and motion while textures would create the look. With 2D graphics, you typically needs to use textures for all three of those aspects, and you need to do it at higher resolutions than a texture would typically need to be for a 3D object, resulting in massive amounts of video RAM usage.

The best thing you can do is streamline the process, reduce the number of frames of animation in your 2D objects, only load certain animations when absolutely necessary while unloading any that aren't being used, build objects out of multiple components whenever possible to avoid having to use a large series of images to animate them, and yeah, to really get your drawing speeds up, avoid switching the active source texture too often. (Allegro sub-bitmaps are OK to switch between so long as they all have the same source bitmap. Also look into the al_hold_bitmap_drawing() command.)

It should also be noted that transfering data from system RAM to video RAM is pretty darn fast, so in a pinch, as previously suggested, you can create a dynamic system for cycling certain character sets into and out of video memory as needed. That said, unless you're writing a 64-bit application (very unlikely if you aren't explicitly aware that you're doing it) you can still only address about 2 GB of system RAM reliably.

So long as you don't exceed 512 MB of video RAM usage, your game should be playable on most modern systems. Try to avoid exceeding 1024 MB of video RAM usage though because then only high-end systems will be able to play your game. You may also want to consider having texture quality settings so that players with less-capable hardware can load all the texture data in at half-size, which would cut memory usage down by 75%!

Arthur Kalliokoski

Kris Asick said:

So long as you don't exceed 512 MB of video RAM usage, your game should be playable on most modern systems.

That's 128 different uncompressed 1024x1024x32 images.

Kris Asick

Arthur Kalliokoski said:

That's 128 different uncompressed 1024x1024x32 images.

Yup!

Or 512 512x512x32 images. Or 2048 256x256x32 images.

But, you can eat through this memory VERY fast when you start doing full-on animations and everything.

Actually, the real storage space available is probably slightly less since you have to account for memory indexing and everything. Storing the image itself into a texture is one thing, but the video card is also going to store additional data about that texture so it knows how to access it properly.

OH! One other thing Aikei: Don't exceed 4096x4096 for your texture sizes as again, doing so may limit who can use your software. Most video cards nowadays can handle larger, but some of the integrated chipsets people have with lower end systems cannot. This is why you can't just load up every single image you have onto one gigantastic bitmap.

Aikei_c

I think I'll stay with separate bitmaps instead of making atlases. I'll probably make a dedicated thread for drawing. At least for now, until I run into actual performance issues (I only checked how many sprites I could draw with 60 fps, but never needed that many yet).
I'll still make atlases for tiles, but actors will use separate bitmaps.

Thomas Fjellstrom

Throwing your drawing in a thread isn't going to help. It's still going to be just as slow, but in another thread... Also you have to make sure you're not using the 3d hardware from more than one thread, or things will slow down much further.

Aikei_c

Thomas Fjellstrom said:

It's still going to be just as slow, but in another thread

Of course it will be, however, the drawing will be executed simultaneously with all the game logic code, which should give it more time to draw, or am I mistaken here?

Elias

Threads run on the CPU but most of the drawing is done on the GPU. So unless the bottleneck of the drawing is the part which is done on the CPU a thread won't help much. And even if the problem is on the CPU and not the GPU, then game logic usually uses so little CPU that it won't affect FPS much having it in a separate thread.

Kris Asick

Go download the Vectorzone public alpha from my website. It's my current project and while the alpha presently available is horribly out of date, it burns a massive amount of GPU power and can handle up to 10,000 sprites at once on mid-range systems... and when I run it on my new high-end computer, it reports so little CPU usage it simply doesn't register anything more than 0%.

Separating rendering and game logic into separate threads is NOT going to help performance. In fact, it will probably make it slightly worse by the added steps necessary to sync the two.

GPUs may be super-powerful, but they process so much more raw data that you're far more likely to bottleneck the GPU before the CPU.

If you're gonna multi-thread anything, loading and unloading of game assets is a good thing to put into a separate thread if you need a persistent and expansive world. Otherwise there's really no point in going multi-threaded.

Aikei_c

Ok, thanks for this information, I'm not gonna multithread it, then.
I'll probably use the dynimically generated atlas trick, anyway I already have something like that, the LRU list, just without keeping resources in an actual bitmap, which shouldn't be that darn hard.
Kris
Do you use atlases for your game? Which tricks do you use to improve drawing speed?

Actually, my video card is pretty bad, that might also matter.

Arthur Kalliokoski

It just now occurred to me to look for advice to load bitmaps after creating the display, but I don't see anybody mentioning this.

Aikei_c

I already do this.

Kris Asick

Aikei_c said:

Do you use atlases for your game? Which tricks do you use to improve drawing speed?

If you mean keeping related sprites altogether on a single texture and using Allegro's sub-bitmap functions to split the textures into multiple usable images then yes, I do. I've just never used the term "atlas" to refer to those. ::)

I also have a drawing trick in place similar to how the NES and old game consoles rendered their backgrounds, by drawing map tiles to one giant texture only when they first come into view, and then drawing the visible portion of that texture to the screen. Otherwise, I'd be drawing thousands of tiles per frame and it's not so much the size of the images you draw that will kill your performance, but the QUANTITY, since every call to render something has to pass from the CPU to the GPU first, with the GPU wasting power waiting for the next rendering request. This is also what makes it possible to do the depth effect I have going on, which is actually accomplished using a SINGLE call to al_draw_prim(). (Which is important as al_draw_prim() has a very high overhead cost to use and calling it multiple times per frame can really hurt your framerate.)

Quote:

Actually, my video card is pretty bad, that might also matter.

One plus side to working with outdated video hardware is you can knowingly build your software to work within those limitations to broaden the scope of how many people can play your game without having a terrible framerate. The big drain of GPU power with my project are the fragment shaders I use to achieve a full-screen glowing effect, but turning that shader off can double the framerate on low-end systems.

Aikei_c

It might seem strange, but using one big bitmap instead of many small ones didn't help.
Actually it even made everything a whole lot slower.
Must be my video card, I have this: http://www.nvidia.com/object/geforce_8600.html

Thomas Fjellstrom

It depends on how you implemented it. It should be faster, if done right, on even (especially) ancient cards

Aikei_c

Ok, I implemented it so: if an image needs to be drawn the GetImage() finction is called, which first loads and draws image into the atlas, and then it passes a pointer to the subbitmap of the atlas where this image was drawn.
Obviously, if the image is already in the atlas it is not drawn into the atlas, and the GetImage() function just returns pointer to a subbitmap where this image is.
For some reason, this code causes a noticeable slowdown, compared to the old code which just used separate bitmaps and not subbitmaps of the same one.

Anyway, what is the right way to do it?

Thomas Fjellstrom

Have you made sure to wrap the drawing calls with al_hold_bitmap_drawing calls?

You want to hold drawing before drawing from the same texture, and un-hold after.

and are you sure you haven't a small bug that makes it constantly re-add the texture? uploading isn't the speediest thing to do.

Aikei_c

I am sure I don't add textures too much, I checked it. They only get added in the beginning, and no more than needed. But it still lags where the previous code doesn't.

I also have a breakpoint on the code which actually loads and draws into atlas and it only gets hit in the beginning. I also "printed out" atlas to check that only needed bitmaps are loaded.

The GetImage() function, which is the only altered function compared to the old code, mostly just executes lines which only return pointer to a subbitmap (after some loading in the beginning).

I do use al_hold_bitmap_drawing.

P.S.: By the way, the al_hold_bitmap_drawing seems to help the old code even better.

P.P.S.: Will I be able to improve performance if I use al_draw_bitmap_region insted of subdividing it into subbitmaps?

~~P.P.P.S.: And if I decrease the size of my atlas bitmap from 2048x2048 to 1024x1024 I have a significant increase in fps (although still not nearly enough to be on par with the old code).~~
That's not true. Actually, the fps is about the same, no matter the size. Probably there is a problem in my code, I'll try to find it.

Kris Asick

Based on what you described, my guess is that you're loading images that are already loaded, thus creating a massive influx of loading calls.

At this point though, you may want to post your code so we can see for ourselves where the bottleneck may be.

Also, make sure you only call al_hold_bitmap_drawing() twice per frame: Once when you start making drawing calls and again when you're done. Calling it repeatedly throughout the rendering of a frame won't provide any speed benefits and would likely make things worse.

Aikei_c

I'm not loading images that were already loaded. There's only onle place when they are loaded and I have a breakpoint there, as I said before. I'm sure I'm not loading them several times.
Actually, I figured out where the bottleneck was: the custom comparison function for map<>, which was pretty nasty, now I fixed it.
Now I have the same speed as I had before when I used different bitmaps.
There's actually too much code for you to handle, I'm afraid

Thomas Fjellstrom

If that's the case, then maybe you should pair/simplify down the code till you have a single simple concise example to show us, that has the problem you're seeing.

Aikei_c

Ok, everything starts here:

#SelectExpand
  ImageResource* res = objectAnimation[id][state][direction][frame];
  ImageAllocationMap::iterator it = imageAllocationMap.find(res->m_Name);
  if (it != imageAllocationMap.end())
  {
    Update(res);
    Point<int> pt = it->second;
    return subbitmaps[pt.x][pt.y];
  }
  else
  {
    return LoadImage(res);  
  }

the Update function:

#SelectExpand
 1void Storage::Update(ImageResource* imageResource)
 2{
 3  m_lru.remove(imageResource);
 4  m_lru.push_front(imageResource);
 5}

the LoadImage function:

#SelectExpand
  1ALLEGRO_BITMAP* Storage::LoadImage(ImageResource* imageResource)
  2{
int size = zipFile->VGetResourceSize(*imageResource);
char* buffer = new char[size];
zipFile->VGetResource(*imageResource,buffer);
ALLEGRO_FILE* f = al_open_memfile(buffer,size,"r");
ALLEGRO_BITMAP* img = al_load_bitmap_f(f,imageResource->m_extension);
al_fclose(f);
delete[] buffer;
if (!img)
{
  Logger::Write(APP_LOG,FILE_LINE("%s <%s>"),"Error loading bitmap", imageResource->m_Name.getStr().c_str());
  return NULL;
}
al_convert_mask_to_alpha(img,al_get_pixel(img,0,0));
if (freePoints.empty())
  FreeOneImage();  
int ipt = *freePoints.begin();
Point<int> pt (ipt % atlasSize, ipt / atlasSize);
al_set_target_bitmap(subbitmaps[pt.x][pt.y]);
al_draw_bitmap(img,0,0,0);
al_set_target_bitmap(screen);
imageAllocationMap.insert(make_pair(imageResource->m_Name,pt));
freePoints.erase(ipt);
al_destroy_bitmap(img);  
m_lru.push_front(imageResource);
if (drawingHeld)
  al_hold_bitmap_drawing(true);
return subbitmaps[pt.x][pt.y];
 30}

Thomas Fjellstrom

I meant, try and pull out what that code is trying to do, in a short and simple example, that still has the problem you're seeing. something we can all run and test.

Aikei_c

I'll think about it tomorrow.

UPDATE:
Here is what I found out: If I just launch a simple allegro project which only draws one image over and over, it can only draw 2000 96x96 images with 60 fps on my machine, then it starts to slow down. I'm pretty sure that all the other resources are eaten by: isometric sorting, needed to know which character to write first, dynamic resource loading system and various other smaller things. I probably can't hope for more than 1000 for my project.

Kris Asick

1000 is still a lot so long as you stay away from particle effects, "Bullet-Hell" shooters and maps with very small tile sizes.

Most of the people into indie games are going to have at least half-way decent hardware so you shouldn't worry too much about developing for low-end systems. Instead, I'd recommend upgrading your video hardware so you can get a better idea of just how much power you can utilize.

10,000 sprites per second is about the limit you'll be able to reach in Allegro with mid-range graphics hardware and proper coding methodology while maintaining 60 FPS. Don't know what the limits are with high-end stuff as I haven't tested yet.

Simon Parzer

I would not fret about performance until you actually run into problems. 1000 sprites, that is a lot already. Go compare that to what a HTML5, or Flash game can utilize.
Developing a game is a creative task and as such, you can do a lot about performance issues, even without optimizing your code, just by being clever about how you do stuff and working around limitations. The quality of your game certainly does not scale with the number of different sprites you can display each frame.

Kris Asick

Unless you're like me and can't stand it when some 2D game can't run 60 FPS. ::)

Thread #613190. Printed from Allegro.cc