I'm using MacOSX 10.9 with MacBook Pro late 2011 (Intel Core i7 2.2GHz) + 16GB ram and AMD Radeon HD 6750M 512 MB. So problem, described below, is not hardware performance problem, I think.
I draw every frame (30fps) a 2d-map of terrain. All bitmaps are videocard bitmaps and their size is 32x32 pixels. The size of 2d-map is about 200x200, so there are drawings of 200x200 images of 32x32 pixels. And this action is very slow and cause 100% cpu using.
There is profiler screenshot in attachment: al_profile.png
'hestur' in that screenshot - is my application name and before calling al_draw_scaled_bitmap there were recursive nodes passing and drawing all of them. But passing through the tree is not so resource-cost as drawing itself is.
This is how drawing is performed:
//---- texture.h file struct texture { int retain_count; char *path; ALLEGRO_BITMAP *bitmap; struct size size; struct list_head list; }; //---- drawing routine (draw bitmap at specified center and with specified size) //---- this place performed for each 200x200 2d-map cell for each frame. struct texture *t = n1->tex; int x = -n1->draw_size.w * n1->center.x; int y = -n1->draw_size.h * n1->center.y; int w = n1->draw_size.w; int h = n1->draw_size.h; //xmsg("%d\n", al_get_bitmap_flags(t->bitmap)); al_draw_scaled_bitmap(t->bitmap, 0, 0, t->size.w, t->size.h, x, y, w, h, ALLEGRO_VIDEO_BITMAP);
If I uncomment xmsg(...) then printed flags value are 1056: that means texture is opengl object and stored in videocard, not in main memory.
Can anyone help me to solve this issue and determine, where is the bottleneck?
P.S.
There was similar thread https://www.allegro.cc/forums/thread/612832 , but no solution.
///// UPDATE after 30 min
I made some researches: wrote two application using SDL2 and liballegro.
Both perform the same task: drawing 200x200 map with 50x50 pixels texture.
Source for SDL:
#include <SDL2/SDL.h> #include <math.h> int main() { SDL_Init(SDL_INIT_EVERYTHING); SDL_Window *wnd = SDL_CreateWindow( "My SDL Game", SDL_WINDOWPOS_UNDEFINED, SDL_WINDOWPOS_UNDEFINED, 800, 600, SDL_WINDOW_SHOWN); SDL_Renderer *ren = SDL_CreateRenderer(wnd, -1, SDL_RENDERER_ACCELERATED | SDL_RENDERER_PRESENTVSYNC); SDL_Surface *bmp = SDL_LoadBMP("test.bmp"); if (bmp == NULL){ return 1; } SDL_Texture *tex = SDL_CreateTextureFromSurface(ren, bmp); SDL_FreeSurface(bmp); int f = 0; SDL_Event e; int quit = 0; while (!quit){ ++f; while (SDL_PollEvent(&e)){ if (e.type == SDL_QUIT) quit = 1; if (e.type == SDL_KEYDOWN) quit = 1; if (e.type == SDL_MOUSEBUTTONDOWN) quit = 1; } SDL_RenderClear(ren); int w = 32; int h = 32; for (int x = 0; x < 400; ++x) { for (int y = 0; y < 400; ++y) { SDL_Rect dst; dst.x = x*w; dst.y = y*h + sin(f / 30.0) * 50; dst.w = w; dst.h = h; SDL_RenderCopy(ren, tex, NULL, &dst); } } SDL_RenderPresent(ren); } SDL_DestroyTexture(tex); SDL_DestroyRenderer(ren); SDL_DestroyWindow(wnd); SDL_Quit(); }
Source for allegro:
include <allegro5/allegro.h> #include <allegro5/allegro_image.h> #include <math.h> int main() { al_init(); al_set_blender(ALLEGRO_ADD, ALLEGRO_ALPHA, ALLEGRO_INVERSE_ALPHA); al_init_image_addon(); al_install_keyboard(); al_install_mouse(); al_set_new_display_flags(ALLEGRO_WINDOWED); al_set_new_display_option(ALLEGRO_VSYNC, 1, ALLEGRO_SUGGEST); ALLEGRO_DISPLAY *display = al_create_display(800, 600); ALLEGRO_EVENT_QUEUE *event_queue = al_create_event_queue(); al_register_event_source(event_queue, al_get_display_event_source(display)); al_register_event_source(event_queue, al_get_keyboard_event_source()); al_register_event_source(event_queue, al_get_mouse_event_source()); al_set_new_bitmap_flags(ALLEGRO_VIDEO_BITMAP); ALLEGRO_EVENT ev; ALLEGRO_TIMEOUT timeout; al_init_timeout(&timeout, 1.0 / 30.0); ALLEGRO_BITMAP *bmp = al_load_bitmap("test.bmp"); int f = 0; while (1) { ++f; al_wait_for_event_until(event_queue, &ev, &timeout); al_clear_to_color(al_map_rgb(0, 255, 0)); // al_hold_bitmap_drawing(1); for (int x = 0; x < 200; ++x) { for (int y = 0; y < 200; ++y) { al_draw_scaled_bitmap(bmp, 0, 0, 50, 50, x*32, y*32 + sin(f/20.0)*15, 32, 32, 0); } } // al_hold_bitmap_drawing(0); al_flip_display(); } return 0; }
CPU using on my configuration described above is:
SDL: 30% CPU and very fast.
Allegro: 100% CPU and very slow.
I don't know, where is a bottleneck in allegro, but it seems to be in determining, what triangles should be drawn and what can be trown up, I think.
Can somebody explain such big CPU using amount in allegro?
A few things of note:
1. al_set_blender() functions on the current target bitmap, so you don't want to call it until after you've set your target bitmap for drawing.
2. You should stick to powers of 2 for ALL primary texture sizes for sake of compatibility. (16, 32, 64, 128, 256, etc.) Only use non-power of 2 sizes with sub-textures.
3. The way you've coded your timing routine, instead of aiming to running 30 FPS all the time, it runs 30 FPS when NOTHING is happening. Any keyboard key you press and any mouse movement you make will cause the framerate to spike, plus anything that happens following your event timeout will reduce you below the target 30 FPS, since you're essentially just adding a 0.0333 second delay to every frame.
4. 100% CPU usage can be caused by a number of factors, including GPU driver optimizations and power utilization settings for laptops. You can't go by CPU usage alone to determine that there's a problem or that the system's being overtaxed. My current project runs close to 20% CPU usage if my graphics drivers are set to perform "Threaded Optimization", whereas if I turn that feature of my GPU off, the CPU usage of my project is so low it doesn't register.
5. Uncomment your calls to al_hold_bitmap_drawing(), you're using them right and they WILL help your framerate! (There's very few instances where you don't want to use this command!)
Now, with all that said, here's something to keep in mind about doing 2D graphics: You can only get away with rendering about 10,000 sprites at a time on mid-range hardware using Allegro without killing the framerate. You're not only trying to render 40,000 sprites at a time but you're also delaying your program code by 0.0333 seconds every time you finish rendering those sprites. So if it takes 0.0333 seconds to render everything, your framerate won't be 30 FPS, it'll be 15 FPS!
If you need to render more than 10,000 sprites per second, you need to look into using Allegro's primitives system, which can allow you to draw many thousands of objects stored in an array with a single call to al_draw_prim(). (Don't try calling al_draw_prim() itself thousands of times though, you'll kill your framerate even worse!)
I don't know enough about SDL to know why it works so much faster, but my best guess looking at the code is that it has very little overhead and thus can run into lots of potential problems depending on whatever else you're doing in the frame to adjust how rendering occurs. Allegro does a lot of stuff behind the scenes so that it's much harder to write code that doesn't work at all or gets screwed up by other calls.
But yeah, try not to exceed 10,000 textured sprites per frame when making a 2D game. (20,000 if you intend to force 30 FPS instead of allowing 60 FPS.) Otherwise, you're gambling on how well optimized the end-user's system is and what it can handle.
Please put your code in <code>code goes here...</code> tags. It makes it much easier to read.
You're not comparing SDL and Allegro in a useful way yet. You need to make the maps the same size, with the same tile size, and render them at the same position. The high CPU is probably because you're not holding drawing. If you don't hold drawing, then each and every draw call will use the CPU to upload the data to the GPU, which comes with overhead and slows the whole process down.
Also, you never want to render an entire tilemap, but only the visible portion. You would never render 200 32pixel wide tiles. Even at 1280x800 (my laptop res) that is only 40x25 = 1000 32x32 tiles. Not 40,000 like in your example.
I tried to make a better test program, and here's the code :
Here's a screenshot of task manager recording the program running at 1024x768 with a 32x32 tile bitmap.
{"name":"608517","src":"\/\/djungxnpq2nug.cloudfront.net\/image\/cache\/a\/e\/ae7d78e51e10a5335f824e68c8006963.png","w":816,"h":453,"tn":"\/\/djungxnpq2nug.cloudfront.net\/image\/cache\/a\/e\/ae7d78e51e10a5335f824e68c8006963"}
As you can see baseline cpu usage is about 10%, 60FPS is about 20% and full bore is about 40% cpu, which means it takes about 30% cpu on one core to run the program at 1024x768 without any pauses between screen draws. Full bore it was running at about 200+ FPS.
You should try my program on OSX and see what kind of numbers you get. When running the program, you need a "test.bmp" in the same directory, of any size less than the screen resolution.
Usage :
MapRenderTest.exe
MapRenderTest.exe SCREEN_WIDTH SCREEN_HEIGHT TARGET_FPS
You can adjust the target frames per second by using the up and down keys. The g key lets you toggle whether the drawing waits for the render timer or not.
The console will output the real FPS and the average time it takes to draw a frame.
Oddly enough, if you minimize the program FPS goes up to 600 and CPU goes way up. But that's just because I don't pause on lost focus.