I'm just starting out w/Allegro, and have a very basic prototype for a shooter. It runs acceptably on my computer (for an I'm-still-learning-as-I-go prototype), but runs terribly on Windows (I use OS X). The difference is striking -- on my Mac, it starts to slow down with ~50 objects on screen. Not great, but not too bad for a first shot. But it starts to slow down ~13 objects in Windows, even on a computer with much better specs than my own.
I don't have any experience programming for Windows, so I'm not even sure where to begin looking for the problem. I don't do anything Mac-specific anywhere in the code.
gprof says it's spending a huge amount of time in gfx_directx_unlock_win.
Anyways, the source and Windows executable are here.
Are you drawing onto a video bitmap? If yes then describe the operations you do with it (what kind of bitmaps, blending, transparency, per-pixel effects, ...)
It runs slow in my P4 2400MHz! but this problem is not Allegro related.
Post the code you use to blit your buffer bitmap to screen.
I read your code a bit and found this thing:
while (!game_end_flag) { LOCK_VARIABLE(t); LOCK_FUNCTION(inc_timer); install_int_ex(inc_timer, BPS_TO_TIMER(120)); LOCK_VARIABLE(e); LOCK_FUNCTION(inc_enemy); install_int_ex(inc_enemy, BPS_TO_TIMER(120)); input_update(); // lots more stuff here that isn't interesting ATM
How often are those things executed? I think it is not once per frame but less often. If it is more than once per application startup then it isn't a good thing, you should lock and install stuff once at initialization time, unless you uninstall timers some where else.
Also as I thought you draw things to video bitmap. What is worse is that it seems like you are doing per-pixel stuff there (rectangles, triangles). Try using regular memory bitmap for backbuffer, it should give considerable speedboost.
It runs slow in my P4 2400MHz! but this problem is not Allegro related.
Post the code you use to blit your buffer bitmap to screen.
Yeah, I figured it was my fault, not Allegro's
Here's pretty much the whole display routine, cut & paste from a couple different files. There's really nothing complicated going on.
| 1 | static BITMAP *sandbox, *bkg; |
| 2 | |
| 3 | sandbox = create_sub_bitmap(screen, screen_x, screen_y, screen_width, screen_height); |
| 4 | bkg = create_bitmap(screen_width, screen_height); |
| 5 | |
| 6 | void display_update() { |
| 7 | Particle *ship = get_first_object(); |
| 8 | while (ship) { |
| 9 | [ship erase: sandbox to: bkg]; |
| 10 | [ship draw: sandbox]; |
| 11 | ship = [ship next]; |
| 12 | } |
| 13 | } |
| 14 | |
| 15 | -(void)erase: (BITMAP *)front to: (BITMAP *)back |
| 16 | { |
| 17 | blit(back, front, drawn_x, drawn_y, drawn_x, drawn_y, drawn_l, drawn_h); |
| 18 | } |
| 19 | |
| 20 | -(void)draw: (BITMAP *)bmp { |
| 21 | triangle(bmp, x, y + sizey, x + 10, y + sizey, x + 5, y, color); |
| 22 | [self save_state]; |
| 23 | } |
How often are those things executed? I think it is not once per frame but less often. If it is more than once per application startup then it isn't a good thing, you should lock and install stuff once at initialization time, unless you uninstall timers some where else.
Also as I thought you draw things to video bitmap. What is worse is that it seems like you are doing per-pixel stuff there (rectangles, triangles). Try using regular memory bitmap for backbuffer, it should give considerable speedboost.
Yeah, the timer initialization there was a mistake. It should be in the game initialization fn. Those are the only two timers I have, at least thus far.
So I'm basically drawing directly to the screen, yes? Since I draw to the sandbox & that's a sub-bitmap of the screen. Are you saying I should draw to a system-memory bitmap first, and then blit that to the screen?
Ultimately, most of these are going to be pre-rendered bitmaps anyways, but are the drawing functions really that expensive? Or is it just because how I'm going about it?
Alright, I have some stuff to try. Thanks!
So I'm basically drawing directly to the screen, yes? Since I draw to the sandbox & that's a sub-bitmap of the screen
yes
Are you saying I should draw to a system-memory bitmap first, and then blit that to the screen?
yes
Ultimately, most of these are going to be pre-rendered bitmaps anyways, but are the drawing functions really that expensive?
They are if you perform them on video bitmaps. Reason is that to get a per-pixel access to video bitmaps you first have to lock them. That probably means the bitmap is downloaded from video RAM to system RAM, updated and uploaded back. That download-modify-upload is done once for every drawing command. The drawing itself should be quite cheap. That is also the reason why gprof showed that most of the time is spent in gfx_directx_unlock_win
[edit]
One simple thing you can do is to make the sandbox a regular system/memory bitmap and see how that works. It should be considerably faster than your current method
Hmm. I tried it, and the performance change is negligible in OS X (i.e., it's acceptable). Sadly, it's also negligible in Windows (at least on the laptop I'm using). It still slows down ~ 13 objects or so, and gprof says pretty much the same thing:
27.72 1.12 1.12 gfx_directx_unlock_win 21.53 1.99 0.87 _linear_blit32 7.18 2.28 0.29 blit
...etc.
here's what I've changed:
sandbox = create_bitmap(screen_width, screen_height); bkg = create_bitmap(screen_width, screen_height); // this line's still the same -(void)erase: (BITMAP *)front to: (BITMAP *)back { blit(back, front, drawn_x, drawn_y, drawn_x, drawn_y, drawn_l, drawn_h); blit(back, screen, drawn_x, drawn_y, drawn_x, drawn_y, drawn_l, drawn_h); } -(void)draw: (BITMAP *)bmp { triangle(bmp, x, y, x + sizex, y, x + sizex / 2, y + sizey, color); [self save_state]; blit(bmp, screen, drawn_x, drawn_y, drawn_x, drawn_y, drawn_l, drawn_h); }
Oh yeah, and I moved the timer initializations to their proper place, i.e. game_init().
I'll try making the bitmaps expicitly system memory bmps by creating them with create_system_bitmap() instead of simply create_bitmap(), I guess. But I've got to be doing something seriously wrong somewhere. What I don't get is why the disparity in performance between OS X and Windows is so large.
I know I could get a performance boost by using pre-rendered bitmaps instead of drawing the objects, but that doesn't look like it would actually solve my problem, so I'm not going to worry about that yet.
**EDIT**
Ok. Reading some of the docs, it sounded like using acquire_screen() and release_screen() might help. I bookended the display loop with those. The performance is pretty much the same, but now gprof reads thus:
18.18 0.70 0.70 blit 10.13 1.09 0.39 _linear_hline32 9.35 1.45 0.36 _soft_polygon 8.57 1.78 0.33 objc_msg_lookup 5.71 2.00 0.22 ddraw_blit_to_self
... and so on. Pretty much the same, except w/the first two lines gone.
I don't quite get what you've changed. Do you blit stuff to both, front (backbufffe?) and screen? You should blit your backbuffer to screen only once per frame, not a single time more.
Usually things go like this:
Clear backbuffer
Blit your images to bacbuffer
blit backbuffer to screen
repeat
Backbuffer and images are all regular memory or system bitmaps, there are no video bitmaps besides screen that Allegro creates for you.
Pretty much the same, except w/the first two lines gone.
That's odd. Together those function timings are no where near 100% usage. Are you sure you didn't miss something?
ohmygod.
I think I found the problem.
BANGS HEAD ON DESK
I'll be with you again in a moment.
...
so yeah. I had the display_update() function called INSIDE the game logic loop. Which means it would loop through all the objects & draw them & draw to the screen ONCE FOR EACH OBJECT. No wonder it was slowing down the more objects there were onscreen. I was drawing the screen objects^2 times each time-step.
so yeah. I had the display_update() function called INSIDE the game logic loop. Which means it would loop through all the objects & draw them & draw to the screen ONCE FOR EACH OBJECT. No wonder it was slowing down the more objects there were onscreen. I was drawing the screen objects^2 times each time-step.
But even so, the OSX version was considerably faster thasn the Windows version.
One thing I suggest you read up on is the Allegro functions for locking and unlocking bitmaps
acquire_bitmap(); release_bitmap();
AE.
I always suspected there was some major problem with something. Things shouldn't slow down with so little objects on screen. Still, glad you've found the problem