(X)ubuntu, Nvidia Drivers, VSYNC and overheating CPU/GPU
Yodhe23

Hi,

I noticed in the last few days that my computer has been turning itself off like it has been overheating. So I stripped it down cleaned it, and tried to find the problem (E6850cpu, 620GT gpu 16gb ram Xubuntu 14.04).

What I found is that after an update (I have been unable to find which one) unless I now turn off the vsync in my allegro programme, it will run one of the cores at maximum, and overheat both the CPU and GPU till it automatically shuts off the machine. Turning the vsync off in the al_set_display_options reduces the CPU usage down to around 20%.
So I decided to check the nvidia graphic drivers as I am currently using the 331.xx ones. Using the 304.xx or 340.xx drivers will both cause the same problem, causing both cpu and gpu to overheat, regardless whether the Allegro flag for the vsync is on or off.
Disabling the vsync and flipping in the Nvidia control panel seems to reduce the cpu usage to around 15%.
I would like to try the Nouveau drivers, but they just totally mess up my monitors (I am using a dual screen configuration), which renders working almost impossible.

It seems to me like a fairly serious issue, although I suspect it has more to do with Nvidia than Allegro.
Any suggestions?

Thomas Fjellstrom

Weird, I do not have that same issue as far as i can tell, but I did have a bunch of NVidia driver releases cause my laptop to overheat and shut down, or just blank the screen on boot.

Yodhe23

It is strange, as it seems it is when I am doing nothing that the cpu suddenly starts to increase its usage, jumping from 20% to around 50% after 30 seconds when running the Allegro program. When I shut down the program everything returns to normal which makes me think there is something very odd going on with Allegro.

Thomas Fjellstrom

Do you have a simple example that shows this behavior?

Chris Katko

nVidia had a huge driver controversy when Starcraft 2 came out where the GPU would go wide open and overheat and crash/reboot/damage-the-videocard.

http://www.dailytech.com/Hot+Starcraft+II+is+Frying+Graphics+Cards+Blizzard+Issues+Temporary+Fix/article19224.htm

Now granted, that issue should have been fixed by now. But it's certainly proof that this can happen.

I know that doesn't solve the issue, but it's a little related background info. But yeah, what Thomas said. Try and replicate the issue with the smallest piece of code you can. If you can go through the Allegro examples and one does it, then you're work is already done and just tell us which example.

ALSO, I almost forgot. Try playing around with the nVidia side VSYNC settings. They allow it in Windows, they should in Linux. Try force off, force on, and "application specific (default)". See if those change anything.

Also, make sure you mention ANY custom configuration settings in the nVidia 3-D game enhancements section where you force AA/Antroscopic Filtering/etc.

Lastly, are there ANY games now that this has happened, that it also effects outside of Allegro.

Yodhe23

Hi I am trying to track to problem down, and emulate it in the shortest code possible, hopefully I will have something in the next couple of days.

I will boot up my steam account and run a couple of games on the machine to see whether it is a "common" problem across libraries and non-allegro specific.

If I turn off the page flipping and vsync in the Nvidia control panel then the cpu usage drops from one core being max'd out (50+%) down to around 33% and the cpu temperature remains steady. I also notice that between psensor and task manager, task manager consistantly claims that the cpu usage is lower by between 10-20%.

It seems to be cause by an al_set_target_bitmap in the following function call. If I commented it out the cpu hovers around 13-20% and all is stable, uncomment this call out (I commented the rest of the function out to a bare minimum) and the cpu usagerises to 50%+ and the cpu temperature increase up from 50 upto 75+ centigrade and then the machine reboots.

void render_button(int x, int y, int xsize, int ysize, ALLEGRO_BITMAP* backimage, char text[255], ALLEGRO_FONT* text_font, int distance)
{
    ALLEGRO_BITMAP *temp_button=al_create_bitmap(xsize,ysize);

    al_set_target_bitmap(temp_button);  //<- this is the problem call

    //al_clear_to_color(al_map_rgba(0,0,0,0));
    //if (backimage)      //al_draw_scaled_bitmap(backimage,0,0,al_get_bitmap_width(backimage),al_get_bitmap_height(backimage),0,0,xsize,ysize,0);
    //if (strlen(text)>0) outline_text(text_font,al_map_rgba(255,255,255,255),xsize/2,ysize/2-(al_get_font_line_height(text_font)/2),ALLEGRO_ALIGN_CENTER,text,distance);
    al_set_target_bitmap(al_get_backbuffer(display));
    //al_draw_bitmap(temp_button,x,y,0);
    al_destroy_bitmap(temp_button);
}

Chris Katko

I don't know if this will solve your problem, but give it a shot as it may narrow things down.

Try, instead of

do

ALLEGRO_DISPLAY *backbuffer = al_get_backbuffer(display); //outside of render button, in your main loop

void render_button(,,,blah blah blah,,,backbuffer) //pass it forward with dependency injection
{
//...
  al_set_target_bitmap(backbuffer); //use the pointer, instead of getting it from a call (see following note)
//...
}

Perhaps it's that al_get_backbuffer every time you render a single object that's blowing up. You should only need to get the backbuffer once per frame. I don't know if that ends up being a complex system call or a simple pass-the-pointer. But if it's a complex call that could contribute to your problem.

[edit]

Looks like I misread your code. You're saying it's not the backbuffer set_target, but it's the new bitmap one.

Thomas Fjellstrom

Setting the same target bitmap over and over really shouldn't do anything. I'm pretty sure we protect against re-setting the same target.

It looks like it protects against setting the current display:

/* Change the rendering context if necessary. */
   if (old_display != new_display) {

But not the current render target fbo in the opengl code. It appears it was attempted, but apparently the fbo can be lost and we don't get told, so we just re-create it each time the target is set. For the backbuffer that just calls glViewport and some matrix stuff, so isn't really an issue.

For a bitmap target it seems to do almost nothing if there isn't already a fbo assigned. If there is one, it will glBindFramebuffer, but i doubt it does much in GL if its the same exact fbo is given (or at least I hope). I could api trace that, but ugh.

Edgar Reynaldo
Yodhe23 said:

void render_button(int x, int y, int xsize, int ysize, ALLEGRO_BITMAP* backimage, char text[255], ALLEGRO_FONT* text_font, int distance)
{
    ALLEGRO_BITMAP *temp_button=al_create_bitmap(xsize,ysize);

    al_set_target_bitmap(temp_button);  //<- this is the problem call

    //al_clear_to_color(al_map_rgba(0,0,0,0));
    //if (backimage)      //al_draw_scaled_bitmap(backimage,0,0,al_get_bitmap_width(backimage),al_get_bitmap_height(backimage),0,0,xsize,ysize,0);
    //if (strlen(text)>0) outline_text(text_font,al_map_rgba(255,255,255,255),xsize/2,ysize/2-(al_get_font_line_height(text_font)/2),ALLEGRO_ALIGN_CENTER,text,distance);
    al_set_target_bitmap(al_get_backbuffer(display));
    //al_draw_bitmap(temp_button,x,y,0);
    al_destroy_bitmap(temp_button);
}

This is what classes and structs were made for. Then you don't need to pass in all your parameters, you simply pass a pointer to an object. You don't want to create the bitmap every time you draw it, you want to create it once and then draw it when you need to. And al_set_target_bitmap can be a very expensive call as it changes the texture that is bound.

Yodhe23

What makes me sratch my head is that if I disable the vsync either through the initial al_set_display_flags (ALLEGRO_VSYNC,2,ALLEGRO_REQUIRE), or via the nvidia control panel the problem goes away. Which to me seems counterintuitive, in terms of what the vsync generally does, limiting the framerate/calls.
Also it seems strange that it is thrashing the cpu, although the GPU temps do rise slowly, the CPUs die before before the GPU gets too hot.

Thanks for looking into this, as it could potentially be a "serious" problem, as I never thought that disclaimer about using Allegro may damage your hardware could in a million years be remotely possible.

Ahhh Edgar, so basically f**k it call it once as a global and forget about it. :P
Thanks I will probably do that as a "patch", but it does seem to be something worth looking at by eyes wiser than my own, incase there is an "underlying" issue.

Chris Katko
Yodhe23 said:

Thanks for looking into this, as it could potentially be a "serious" problem, as I never thought that disclaimer about using Allegro may damage your hardware could in a million years be remotely possible.

All software, and especially all software development is capable of damaging hardware. It's just so rare that we take it for granted. It used to be as simple as setting the wrong VGA registers to damage a videocard or monitor.

And the "overheating GPU" thing isn't really Allegro's fault, even if it causes it. If in 2015, a GeForce allows itself to overheat to the point of damaging itself, then that's clearly nVidia's fault, like the Starcraft 2 fiasco. It's not like you're doing a low-level firmware update in a hard drive (which could fail if you mess it up). You're calling elaborate APIs, not registers anymore.

Yodhe23

It's not really the GPU that is overheating though I suspect it would if the CPU didn't overheat first.
Which is part of the thing that I find strange, that altering the vysnc (on the graphics card?), is affecting usage of the CPU so radically.
I will try and replicate the problem in the smallest amount of code, and then try it on my other Xubuntu machine with the same version drivers but with a 9600 card instead to see whether it is a specific card/driver issue, or maybe something else.

Thomas Fjellstrom

CPU spikes on vsync tells me they are attempting to implement vsync with a busy loop or some other such nonsense.

Edgar Reynaldo

CPU spikes on vsync tells me they are attempting to implement vsync with a busy loop or some other such nonsense.

Exactly my thought.

On a similar note, the flash they use on deviantart for the slideshow widget locks up my entire computer, even the mouse and the little swirling busy icon, until its done loading. Ever heard of yielding a little cpu now and then??? Jeez.

Yodhe23 said:

Ahhh Edgar, so basically f**k it call it once as a global and forget about it. :P
Thanks I will probably do that as a "patch", but it does seem to be something worth looking at by eyes wiser than my own, incase there is an "underlying" issue.

Basically what I wanted to imply was that resources should only be loaded once or as few times as possible, and definitely never on every frame, unless you're doing something like streaming from a sound or video file. And a little OO never hurt anyone. Organization is good for code. That's all I'm suggesting with the classes and structs.

Yodhe23

Okay I had the problem again, and it was the al_set_bitmap_target call at fault again inside a different function. It seems anywhere that calling al_set_bitmap_target repeatedly (as in say a ALLEGRO_EVENT_TIMER) causes the cpu to heat up to unacceptable levels.
I can work around the problem, but I am still at a loss why this would happen. Anyway I should have some time tomorrow to see if I can delve deeper into what is going on, but I am not too hopeful.

Chris Katko
Yodhe23 said:

It seems anywhere that calling al_set_bitmap_target repeatedly (as in say a ALLEGRO_EVENT_TIMER) causes the cpu to heat up to unacceptable levels.

Is your computer in disrepair? Do other programs cause this, or solely Allegro ones?

Thomas Fjellstrom

Are you changing the target repeatedly? Is there a reason you're calling that function a lot? It normally shouldn't be required.

I think we could potentially make it so allegro doesn't do any changes if it's called and the current target hasn't changed. Not sure how simple that'll be (depends on how the target stuff is handled...)

Yodhe23

The first thing I did was rebuild/check my machine's hardware, and stress tested it fine.
I don't seem to have this problem on playing a limited number of steam linux games I own.

The reason why I have been changing target repeatedly was that I was kinda of used to creating temporary bitmaps on the fly as I used to do with Allegro4 with no problems (maybe because they were memory bitmaps rather than video ones??).

-edit
I have just recompiled and tested the game on my other machine, Xubuntu, Q6700, 9500GT 4Gb (311.113 drivers), and whilst it does seeming thrash one of the cores, it does not display the over heating problem. I am going to go out and buy and "new" different graphics card model to replace the 620GT in the other machine, and see if the issue persists.
If I have the time I will also boot the machine displaying the overheating with Windows, and recompile the game to see if the problem persists.
However I am being to suspect I have just discovered one of those rare combinations, of O/S, drivers, hardware and software configurations that triggers an unforeseen consequence.

As it is I will just take peoples advice, and rewrite my code so it is more idiot/me proof.

-2nd edit
Oh yeah now I remember why I built the card graphics on the fly as I watch the MB usage for the game rise from 200Mb to 400Mb. So in answer to your question, why am I calling it that often, to save memory (thinking of mobile platforms). I suppose I could jigger the functions around to create the bitmaps before the event loop, but because of the way I have structured things it's quite an arse so I would rather not, so instead I have a separate image for each card.

Thomas Fjellstrom

Creating new video bitmaps, and especially drawing to them can be slower than you may be expecting. It really is better to structure things such that that isn't required. Maybe try and set up a resource manager so you can just ask that manager code for a bitmap, and it'll load it if and when it's needed (and can free things that haven't been used "recently").

What do you mean by a separate image for each card?

Yodhe23

By a separate image, I mean an ALLEGRO_BITMAP *image, in the deck.card struct.

I guess there are some quirks from having done things the old Allegro4 way that I will have to work out. I have a thunk about the resource manager, but with the state of the project (and it is coded in c) I am not sure how worth it, it would be.

Thomas Fjellstrom
Yodhe23 said:

By a separate image, I mean an ALLEGRO_BITMAP *image, in the deck.card struct.

Ah. Kind of thought thats what you meant, but had to make sure.

Quote:

with the state of the project (and it is coded in c) I am not sure how worth it, it would be.

Shouldn't be much different than in C++. "Cleanest" way would be to use a struct and some prefixed functions that take that struct.

ResManager *rm_create(...);
ALLEGRO_BITMAP *rm_get_bitmap(ResManager *, const char *image);

the manager can cache the bitmaps in the ResManager struct, so they don't have to be reloaded all the time.

Can also stash all of the images in a texture atlas so you aren't changing textures all the time, and the returned bitmap is just a sub bitmap.

Instead of passing the res manager to each manager call, you can store the resource manager struct in a static var in the resource manager file, but just make sure to provide a rm_init and rm_deinit type set of functions so the manager can set its stuff up. I don't consider that quite as "clean" as passing around the manager var, but sometimes it gets too complicated to pass in the rm var everywhere.

Modern hardware accelerated graphics don't mesh well with how you needed to do stuff with Allegro 4. Most times with Allegro 5 its best just to draw everything every frame, which is something you had to avoid in Allegro 4.

Yodhe23

Fine to draw everything each frame, but it does seem a little counter intuitive to find out that generating bitmaps (setting the target) each frame causes such problems.

I think as I said before I will just jigger it to create the bitmaps in the function just before event loops. Mostly as a resource manager seems to be spooning another layer of spaghetti onto the code, and probably will be something I consider on a new/clean project rather than at the juncture I am at now.

Thanks for your comments, and I will just chalk it up to experience never to use al_set_target_bitmap with any "frequency" as it can cause unforeseen consequences.

Thomas Fjellstrom
Yodhe23 said:

Fine to draw everything each frame, but it does seem a little counter intuitive to find out that generating bitmaps (setting the target) each frame causes such problems.

Maybe? But in the end, what is happening is that you are loading a new bitmap, then uploading that to the GPU every single time. Uploading to and downloading from the GPU is a fairly expensive process[1] as it requires transfers over the PCIe bus.

Quote:

Mostly as a resource manager seems to be spooning another layer of spaghetti onto the code

I find it makes it so there is less spaghetti. you get a nice layered set of subsystems that each do one basic thing, instead of a ball of code in your main game function that does everything.

References

  1. Or more accurately it is more expensive than just uploading once, and reusing the same texture
Yodhe23

Believe me, coming from a pascal procedural background my code probably resembles the worst atrocities against programming a monkey could make and still get something to work.

Thomas Fjellstrom
Yodhe23 said:

Believe me, coming from a pascal procedural background my code probably resembles the worst atrocities against programming a monkey could make and still get something to work.

Hey now, it is more than possible to create clean and readable code in C and Pascal. Now old school Basic? not so much ;)

Yodhe23

Oh my code is perfectly clean and readable (to me). It is probably just not how you (or anyone else) would do things, but it works for me.

Ahhhh the joys of BASIC on the speccy rubber keyboard when you didn't have to type out everything, and all variables were global, and functions were just gosubs.... Brings back a tear to this middle-aged man.

So would you have any pointers on reusing textures to use instead of creating bitmaps every time as a potential solution.

Thomas Fjellstrom
Yodhe23 said:

So would you have any pointers on reusing textures to use instead of creating bitmaps every time as a potential solution.

Hm, well you can load them at start like you suggested, and store them in a convenient place (an array perhaps? or a struct with a bunch of individual ALLEGRO_BITMAP pointers?).

Yodhe23

Oh I did that already, just as you suggested with an array of bitmaps created at the start of the function, instead of in the event loop.

Thomas Fjellstrom

Without splitting out the actual resource management into a separate set of functions, that may be the easiest solution. you can even setup defines or constnats to "name" bitmaps:

enum {
PLAYER_SPRITE_N = 0,
PLAYER_SPRITE_W,
HUD_BACKGROUND,
...
};

al_draw_bitmap(bitmap[HUD_BACKGROUND], ...);

Yodhe23

I use defines for that purpose. Though enums have that nice list ability, again another hang over of learning programming back in the day.

P.s Yay memory is back down to 150Mb usage.

Thomas Fjellstrom

I used to use defines all the time, but i'm starting to like enums more :)

Also glad you got things manageable.

Thread #615154. Printed from Allegro.cc