The age-old 100% CPU usage problem - but still no solution
jamyskis

Hi all,

I'm back on the forum after a long absence due to (ahem!) illness, this time with an age old question which I see posted on this forum dozens of times but never really explained in a way I could understand. I've managed to fix the speed of my game, but performance is choppy because, for whatever reason, the CPU is taking up 100% of the time. I've made the game logic updates event-based, so they are called by an interrupt at regular intervals, and I've tried cutting out the game_display() (the function that contains all the code dealing with drawing the screen) completely to see if the drawing was taking too much time (it throws around 50-60 sprites at a time so it would be understandable, especially under Linux with no GFX hardware acceleration). Without any drawing functions, the game still swallows up 100% CPU time.

This is the main.cc as it stands:

1/* Open Invaders
2 * (c) 2006 Darryl LeCount
3 *
4 *
5 * This program is free software; you can redistribute it and/or modify
6 * it under the terms of the GNU General Public License as published by
7 * the Free Software Foundation; either version 2 of the License, or
8 * (at your option) any later version.
9 *
10 * This program is distributed in the hope that it will be useful,
11 * but WITHOUT ANY WARRANTY; without even the implied warranty of
12 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
13 * GNU General Public License for more details.
14 *
15 * You should have received a copy of the GNU General Public License
16 * along with this program; if not, write to the Free Software
17 * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
18 */
19 
20#include "allegro.h"
21#include "functions.h"
22//#include "./alogg/alogg.h"
23#include <iostream>
24
25int fullscreen_mode, frames_missed;
26 
27using namespace std;
28 
29void interrupt_time_control()
30{
31 frames_missed++;
32};
33 
34END_OF_FUNCTION(interrupt_time_control);
35 
36int main(int argc, char *argv[])
37{
38 LOCK_FUNCTION(interrupt_keys);
39 LOCK_FUNCTION(interrupt_time_control);
40
41 LOCK_VARIABLE(frames_missed);
42
43 frames_missed=0;
44 fullscreen_mode=0;
45
46 if(argc>1)
47 {
48 if(argv[1][0]=='-')
49 {
50 fullscreen_mode=0;
51
52 switch(argv[1][1])
53 {
54 case 'f': fullscreen_mode=1; break;
55 case 'w': fullscreen_mode=2; break;
56 };
57 };
58 };
59
60 initialise_game();
61 cout << "Allegro initialised...\n";
62
63 display_setup(fullscreen_mode);
64 cout << "Allegro display established...\n";
65
66 predefine_variables();
67 create_bitmasks();
68 cout << "Collision bitmasks initialised...\n";
69 
70 cout << "Have fun!\n";
71
72 intro_sequence();
73
74 while(program_still_active())
75 {
76 title_screen();
77 predefine_variables();
78 reset_enemies_position();
79 reset_enemies_state();
80 initialise_ingame_music();
81 install_int(interrupt_time_control,4);
82
83 while(game_still_active())
84 {
85 for(int repeats=0;repeats<frames_missed;repeats++)
86 {
87 update_logic();
88 };
89
90 game_display();
91
92 frames_missed=0;
93
94 rest(1);
95 vsync();
96 };
97 };
98
99 cout << "Thank you for playing!\n";
100
101 allegro_exit();
102};
103END_OF_MAIN();

Interrupt_keys() looks thus:

1void interrupt_keys()
2{
3 char nameext[20];
4
5 if(key[KEY_LCONTROL]&&key[KEY_S])
6 {
7 sprintf(nameext,"oi_screen_%d.bmp",(rand()%8998)+1000);
8 save_bitmap(nameext,screen,gamepalette);
9 };
10
11 if(key[KEY_LCONTROL]&&key[KEY_C])
12 {
13 program_active=false;
14 game_active=false;
15 abort();
16 };
17};

Interrupt_time_control() is just this:

void interrupt_time_control()
{
  frames_missed++;
};

I've been playing with this for months and am just about ready to chuck it in and delete the lot. Can anyone give me a hand as to why I can't stabilise the frame rate and drop the CPU usage? If there's anything else I should paste in let me know.

Thanks!

Darryl

Hard Rock

Thats because by default your program will use 100% of the cpu even if its just repeating a loop. Thats because it's not just repeating the loop, but repeating it as fast as possible.

To give up CPU cycles for other programs you need to call rest or you can also use sleep although I'm not sure which header the latter is in, too much java programming :S

Anyway more on rest: http://alleg.sourceforge.net/stabledocs/en/alleg005.html#rest

[edit]

Woops just noticed you had a rest. Drop vysc. I believe some forum posters mentioned that vysnc would allow timing, but also consume 100% cpu usage while it did. As vsyc doesn't return until its done, rest literally only gets called 60 times per second, if that.

[edit2]
There was also just recently a post on sample game loops, though it's not 100% relevant, and looking through it I didn't really find any stellar examples (most of it is psuedo code) it might help. http://www.allegro.cc/forums/thread/590871

[edit3]
If neither of these helps, I'll take my GWARA Tins entry, and modify it so it doesn't use 100% cpu and post the updated code. Or try to at least. Just let me know.

Kris Asick

Your frame dropping algorithm fails to take into account if the logic processing is taking too long. Frame dropping cures when the rendering takes too long, but if the logic is taking to long, more logic updates will be requested while logic is being processed, resulting in potentially large amounts of logic loops per frame, which is possibly the source of your framerate troubles and your 100% CPU usage.

What you need to do is put a limit to how many logic updates you'll allow on one game loop. 4 is a good limit. This prevents your logic loop from over-processing, which will result in slow-down if your system can't handle the amount of logic that needs to be processed, but that will be true of any game using fixed-time logic.

You may also want to check and make sure you're not running 32-bit graphics. 16-bit graphics will run almost twice as fast as 32-bit on any system. Also, avoid strange colour depths such as 15-bit or 24-bit which may invoke compatability layers if your video card can't handle them by default.

Also, vsync() will not eat up so much CPU time that you'll get 100% usage so long as you have rest() or Sleep() statements to give time back to the OS. However, you should place your rest() statement immediately after rendering, not right before vsyncing.

--- Kris Asick (Gemini)
--- http://www.pixelships.com

Goalie Ca

While your program is running it counts towards using cpu's. Yielding/resting will tell the operating system to stop running your program for the time being. While its doing nothing it is using 0%.

Here is a fix that shouldn't affect any thing. It works exactly the same except that when the count is 0 it returns to the operating system. When the count goes back above 0 it wake the process up. Just replace sdl_semaphore with one from your operating system be it windows or linux (i can help you if you need).I used the evil sdl because allegro has nothing similar/portable IIRC.

//the timer function. basically SemPort will "increment" the tick variable. 
Uint32 SDLTimer_Callback(Uint32 interval, void* param)
{
  SDL_SemPost(sleepSemaphore); //will wake up the other thread. "increments" the count
  return interval;
}  

//the main loop
while ( !quitMe )
{
  SDL_SemWait(sleepSemaphore); //sleeps if count = 0, when it is no longer 0 it wakes up and decrements the count
  gameLogic();
}

edit: i should explain that sleepSemaphore is basically just a counter. It is used for synchronizing between threads but this works perfectly fine in a situation like this. This uses < 1% of my cpu in most cases (also opengl for hardware drawing helps!)

edit2: rest(1) doesn't sleep 1ms esp on windows. Windows is not a real-time operating system. You'll only get accuracy to ~ 10ms or so. SO like 10ms 20ms 30ms etc...

Kitty Cat
Quote:

I used the evil sdl because allegro has nothing similar/portable IIRC.

But there's aready a perfectly valid and portable way to do it already...

gnolam

And where might semaphore.h be in MinGW? In MSVS?

Kitty Cat
Goalie Ca

Actually that's pretty good thing. Windows is the only platform (other than embedded) that doesn't support pthreads). The only reason why i didn't use sem_wait was because almost all n00bs who post use windows. It's a fact! But I hope that works for vc (or whatever they used).

If it does... then its about time to make a wiki entry and end this line of threads once and for all!

Steve++
Quote:

The only reason why i didn't use sem_wait was because almost everyone who posts uses windows.

Fixed :)

A game with real-time performance requirements using 100% CPU time isn't a problem in itself. In fact, when your game yields its current timeslice to the OS, it must wait until the OS's next round of scheduling. That may be ok for tic-tac-toe. How demanding do you think your game will be on the CPU?

Goalie Ca

Putting the thread to sleep and then waking it up is not a problem. In fact since timer is run in a seperate thread it gets woken up, then you have to reschedule the main loop thread anyways. When the sem_post occurs it changes the threads state to ready... the same as it would be if it was pre-empted because its time slice expired.

Windows is not real-time, not anything close... so you have to live with what you get i suppose. I guess this is why the thread about delta timing came up where you use getTimeOfDay() and find out how much time really elapsed (in milliseconds). But there are problems with pausing and scheduling (if you're relying upon small differences rather than averages then you'd need to lock everything up during that computation sequence)

Thomas Harte
Quote:

A game with real-time performance requirements using 100% CPU time isn't a problem in itself.

If your game is going to make my fans come on, it had better look like Half-Life 2 or I won't be loading it again.

Goalie Ca

I sucked it in and wrote an allegro wiki entry under timers. http://wiki.allegro.cc/Timers#Yielding_the_CPU

edit: and some simple test code to measure performance. Here's what time outputs:
real 0m59.941s
user 0m0.040s
sys 0m0.012s

So it spends 1 minute of real time.. It spends 0.012 seconds doing system calls. It spends 0.040 seconds actually executing my code! I think that's a pretty good start :D

That gives a cpu usage of basically 0% (it uses 1.2% of a single second.. now divide that by 60 to get the actual rate)

1 
2#include <allegro.h>
3#include <semaphore.h>
4//for printing seconds since 1970 or whatever
5#include <stdlib.h>
6 
7//number of cycles per second
8#define BPS 60
9 
10//create the mutex
11sem_t timer_sem;
12 
13void ticker(void)
14{
15 sem_post(&timer_sem);
16}
17END_OF_FUNCTION(ticker);
18 
19 
20int main(int argc, char** argv)
21{
22 sem_init(&timer_sem, 0, 1); //initialize the semaphore, set the tick count to 1
23 allegro_init();
24 
25 LOCK_FUNCTION(ticker);
26 
27 install_timer();
28 install_keyboard();
29 set_color_depth(24);
30 set_gfx_mode(GFX_AUTODETECT_WINDOWED, 640, 480, 0, 0);
31 install_int_ex(ticker, BPS_TO_TIMER(BPS));
32 unsigned char doLoop = 0xFF;
33 
34 while(doLoop)
35 {
36 sem_wait(&timer_sem);
37 if (key[KEY_ESC])
38 doLoop = 0;
39 //do stuff here
40 rectfill(screen, 1, 1, 200, 20, makecol(0,0,0) );
41 textprintf_ex(screen, font, 10, 10, makecol(255, 100, 200),-1, "Time: %d", time(NULL) );
42 //end of loop!
43 }
44 
45 return 0;
46}
47END_OF_MAIN();

CGamesPlay

The thing about sem_wait is that if it gets behind, you get really funny speedups as it plays catch-up.

Kibiz0r

Like when you alt-tab back to Diablo 2 after leaving it minimized for a few seconds?

GullRaDriel
Kibizor said:

Like when you alt-tab back to Diablo 2 after leaving it minimized for a few seconds?

Like that.

jamyskis

OK, this is the third time I've typed this answer...Allegro.cc keeps logging me out for whatever reason and the answer gets lost...

Anyway, the only solution I've found that effectively yields the CPU is to omit the vsync() and include rest(1). However, update_logic() was then executed so seldomly that frames_missed got really high and as a result I had frame rates of around 1 frame every five seconds. I tried limiting frames_missed to four, but that just made the game extremely slow and extremely choppy, although CPU usage was at about 80% there.

Edit: I've just noticed that the title screen, which displays a large sprite in the middle of the screen, two textout_ex's and 500 pixel "stars" in the background is chomping up 90% CPU time on an Athlon 2400 - is Allegro's Linux graphics driver really that bad?

Edit 2: OK, I have the game running doing everything except blitting the backbuffer to the screen, and it runs at a fairly respectable 45% CPU usage. As soon as I do blit(display,screen,0,0,0,0,800,600) though, it just flats out to 90-95%...isn't there a faster way to blit display to screen?

Goalie Ca

clippy says It looks like you are trying to profile a program. Would you like help?

a) use a profiler to actually determine how much time is spent in each function
b) understand how cpu time is measured and improve your yielding (rest doesn't cut it!)

jamyskis

Thanks - I'll give it a go, if only to bring the 45% down a bit. Still, I'd love to know why a single Allegro function call is chomping up 50% of my CPU time. Just chopping out that single blit call brought it down from 95% to 45%...

Kitty Cat
Quote:

The thing about sem_wait is that if it gets behind, you get really funny speedups as it plays catch-up.

How so? Can't you do frame dropping?

sem_wait(&my_sem);
do {
    logic();
} while (sem_trywait(&my_sem) == 0);

draw();

Goalie Ca
Quote:

I'll give it a go, if only to bring the 45% down a bit. Still, I'd love to know why a single Allegro function call is chomping up 50% of my CPU time. Just chopping out that single blit call brought it down from 95% to 45%..

What i actually mean is that how can you really be sure you're actually spending that much cpu time in a single blit. I really doubt it.

Rest is not an accurate function. Windows is not a real-time operating system (though linux has patches). This means that if you rest(1) it may take 20ms to return to the program or it may return right away, execute a loop, go to sleep, return right away, etc... This is all scheduling dependant.. so many adding a blit changes how windows schedules your process.

A profiler will actually tell you how much time you spend in each function. In my example code above. i have a simple rectfill.. When i change rectfill to cover the entire screen (with a gray colour) my cpu usage becomes:

real    1m0.350s
user    0m0.884s
sys     0m0.052s

So.. there's some expense in rectfill. Blitting should be comparable in runtime.

Now gprof (my profiler!) outputs: Clearly you can see that after just a

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  Ts/call  Ts/call  name    
  0.00      0.00     0.00     4149     0.00     0.00  rectFill
  0.00      0.00     0.00     4149     0.00     0.00  textFill
  0.00      0.00     0.00        1     0.00     0.00  mainLoop

Now that's funny. The time spent is too small to accurately measure. :D

edit: there must be a way to do fixed width fonts in this forum!

edit2: thanks baf! and weird.. the post-preview doesn't use the same font.

BAF

You can use the [pre][/pre] tags.

Pretty fixed widthness!

CGamesPlay
Quote:

How so? Can't you do frame dropping?

No, there's no way to set a semaphore to 0 again (aside from repeatedly polling it, so this would work, but looks ugly):

sem_wait(&sem);
while(sem_trywait(&sem) != -1);
logic();

Kitty Cat

Frame dropping doesn't mean dropping logic frames, it means dropping rendered frames :P If you want to implement a maximum-allowed skip:

sem_wait(&my_sem);
do {
    logic();
while (sem_trywait(&my_sem) == 0 && ++skip < MAX_SKIP);

if (skip >= MAX_SKIP)
    while (sem_trywait(&my_sem) == 0) /* do nothing */;
skip = 0;

draw();

Alternatively, instead of the empty sem_trywait loop, you could destroy and re-init the semaphore, but that's likely not efficient.

jamyskis

OK, I've profiled it, and I'm still trying to make head or tail of the output. The interesting bit seems to be:

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  us/call  us/call  name    
 81.27      0.13     0.13                             pmask_load_func
  6.25      0.14     0.01     1762     5.68     5.68  check_for_next_level()
  6.25      0.15     0.01      875    11.43    11.43  display_background()
  6.25      0.16     0.01                             main
  0.00      0.16     0.00     1762     0.00     0.00  read_input()
  0.00      0.16     0.00     1762     0.00     0.00  process_ufo()
  0.00      0.16     0.00     1762     0.00     5.68  update_logic()
  0.00      0.16     0.00     1762     0.00     0.00  check_if_game_over()
  0.00      0.16     0.00     1762     0.00     0.00  collision_detection()
  0.00      0.16     0.00     1762     0.00     0.00  move_automatic_items()
  0.00      0.16     0.00     1762     0.00     0.00  check_if_extra_life_due()
  0.00      0.16     0.00     1762     0.00     0.00  process_enemy_projectiles()
  0.00      0.16     0.00      876     0.00     0.00  game_still_active()
  0.00      0.16     0.00      875     0.00    11.43  game_display()

pmask_load_func obviously belongs to pmask although I can't imagine why it reports it as taking up 81.27% of the processing time. Commenting out the call to collision_detection() (which contains the only references to pmask.h in the entire loop) does nothing for the performance, commenting out the blit call reduces usage by around 50%.

CGamesPlay

It looks like you just started the game and quit it and called it "profiling". You have to actually run the code for a while.

jamyskis

I started the game as usual, played it through three levels, lost, went back to the title screen and then exited the game cleanly. I know the profiler needs to be able to have used all of the functions available and it did get the chance...

Edit: I've attached a newly created profile which I created from playing the game for 15 minutes straight.

Goalie Ca

Well i'm not exactly sure what all those functions do and how the code is structured.. but it appears to me like there really is no bottleneck per say in your code. Just look at the cumulative time. If your program ran for 15 minutes none of those functions really ate all that much time.

 total     the average number of milliseconds spent in this
ms/call    function and its descendents per call, if this 
	   function is profiled, else blank.

So yours is actually listed in microseconds (us) per call. So if you add up the drawing commands that is not accounting for 7 minutes of total time (giving roughly 50% cpu usage).

Now i'm assuming your game loop is in the main function. If you took it out of main and put it in its own function then you could easily use the first table and fill in some of the gaps.

But it doesn't look like your code it slow at all. The "reported" cpu usage is probably coming from system scheduling.

jamyskis

OK, I found out by using "top" the CPU usage of each process, and I found out something rather interesting - the game itself never takes up more than 60% of the CPU, even at the most intense moment. Xorg, on the other hand, shoots up when the game is running for whatever reason. I'm not sure how I can create a log which would test the same thing running in fullscreen mode. Anyway, I don't think I can hope for much better performance at this time, short of switching the whole thing to OpenGL (which I'd rather avoid given that it shouldn't really be necessary).

Anyway, thanks to all those that contributed with tips and help!

Don Freeman

I would just do something like:

1int main(int argc, char *argv[])
2{
3 allegro_init();
4 set_color_depth(32);
5 set_gfx_mode(GFX_AUTODETECT_WINDOWED,640,480,0,0);
6 set_color_conversion(COLORCONV_TOTAL);
7 text_mode(-1);
8 install_timer();
9 install_keyboard();
10 install_mouse();
11 textprintf(screen,font,0,0,makecol(255,255,255),"Press ESC to exit...");
12 show_mouse(screen);
13
14 bool quit = false;
15 bool redraw = true;
16
17 BITMAP *imageBuffer = create_bitmap(SCREEN_W,SCREEN_H);
18 clear_bitmap(imageBuffer);
19
20 while ( !quit )
21 {
22 if ( keypressed() )
23 {
24 if ( key[KEY_ESC] )
25 {
26 quit = true;
27 clear_keybuf();
28 }
29 redraw = true;
30 clear_keybuf();
31 }
32 if ( redraw )
33 {
34 redraw = false;
35 clear_bitmap(imageBuffer);
36 // Do drawing...
37
38 // End drawing...
39 blit(imageBuffer,screen,0,0,0,0,imageBuffer->w,imageBuffer->h);
40 }
41 else
42 {
43 // Update ai and yield unused time to system...
44 rest(1);
45 }
46 }
47 destroy_bitmap(imageBuffer);
48 return 0;
49}END_OF_MAIN()

This tells the system to yield until some event(redraw), which is in this case a key pressed, has occurred. It is up to you and your game logic to set this variable when you need to take control of the processing again. rest(0) and yield_timeslice never has seemed to work for me in windows or linux (several different flavors)...just pass rest(1) and I get the cpu to go to 0%...even during game play!

le_y_mistar

i havent done any game programming in a while, so take my question with a grain of salt.

Wouldn't it be better to use threads instead of loop+timer/frame rate counter?

Don Freeman

I don't think there IS a right or wrong answer to most of game programming...It is what is right for YOU. I have finally decided this after reading posts and books all of which contradict each other. It is ultimately up to you, and what you like. YOU are the one who has to code it and probably maintain it.
Each choice you have asked about has it's weakness and strengths. It is up to you to decide which problems and hurdles you wish to face...I DO know that threads can get you into a LOT of trouble if you are not careful...but you can do a lot of stuff you might not be able to do (at least no easily) without them...I would also imagine threads would be harder to debug as well...but I am certainly not a master of the art...:)

gnolam
Don Freeman said:

rest(0) and yield_timeslice never has seemed to work for me in windows or linux (several different flavors)...

That's because they're not supposed to reduce CPU usage. They yield to other processes. That's it.

Goalie Ca

Interesting though.. goalie's method of using semaphores seems to work quite well. If you need finer grain control implement your own semaphore class by wrapping one.

jamyskis

Well, as I understand it, there have been three ways here suggested to reduce CPU usage:

(a) Use rest(1) somewhere in the game loop. The least effective, but the easiest to implement.

(b) Use the Semaphore library. More effective, but a bit more difficult to use. I may tackle this when I'm a little more confident with my programming.

(c) Get straight down to the grit of it and use threads. The most efficient method possible, but can be extremely unstable if you don't know what you're doing.

CGamesPlay
Quote:

(b) Use the Semaphore library. More effective, but a bit more difficult to use. I may tackle this when I'm a little more confident with my programming.

What's this "Semaphore library" you're talking about? You mean pthreads.

Quote:

(c) Get straight down to the grit of it and use threads. The most efficient method possible, but can be extremely unstable if you don't know what you're doing.

I fail to see how this reduces CPU usage. In fact, it increases CPU usage, as now you have two busy loops running around :P

Thomas Harte
Quote:

Well, as I understand it, there have been three ways here suggested to reduce CPU usage:

My method is, as it has always been:

  • do a frame draw

  • read a counter, compute number of milliseconds/whatever since last here in loop

  • if now one or more logic ticks behind, do that many logic ticks

  • otherwise sleep for the number of milliseconds/whatever left before next update

In SDL I use SDL_GetTicks which returns the number of milliseconds since the app started. In Allegro I have a second thread that updates a timer that is much coarser than milliseconds but gets the job done. My ChristmasHack '05 entry, Nuclear Attack! (Windows, OS X) is one example. On a virtualised copy of Windows 2000 on my MacBook Pro, it uses less than 5% of one CPU core.

Thread #590932. Printed from Allegro.cc