Allegro.cc - Online Community

Allegro.cc Forums » Programming Questions » Rare bitmap corruption

This thread is locked; no one can reply to it. rss feed Print
Rare bitmap corruption
Space cpp
Member #16,322
May 2016

My game uses a few texture atlases generated at runtime, and the most important one is quite big: 8192x4096 (I could use slight smaller dimensions, but I need to obey the power of two rule to avoid rendering artifacts)

Then a few months ago a player reported a bug where most objects (all from that large atlas) are invisible. He also shown me a print where one specific object was being drawn with the lower half corrupted. It was an intermittent problem.
I asked him to use a debug command to export the atlas using al_save_bitmap and the result was a corrupted png with no content.
After several tries we found out that this bug doesn't happen in the OpenGL render, so we made it a temporary fix.

But now this bug has arisen again, when someone was streaming my game :-/

Besides making the game default to OpenGL, is there any ideas of what I can do to solve/avoid this bug?

----------
My games

Arthur Kalliokoski
Second in Command
February 2005
avatar

It could easily be a video driver bug or something, not necessarily a problem with your code or Allegro. Ask the users who have problems what video card they have along with the driver version, what OS they use, yadda yadda yadda. If, say, the problem only happens on Nvidia it would indicate the problem is there. If it's only one guy that's playing your game, you need a wider base to really narrow it down.

They all watch too much MSNBC... they get ideas.

Chris Katko
Member #1,881
January 2002
avatar

Yes, this can easily be a driver bug or a users hardware dying, and it's only the way Allegro sets up and uses OpenGL/D3D that's triggering it.

Definitely find out what card they're using. (nVidia, AMD, Intel) Also if they're on Windows or Linux! And if overclocked or overheating! Some games, by how you run their timing code, can overheat computers or GPUs and overstress laptops and other crappily designed computers.

I "think" there were issues by the Factorio team (originally made with Allegro 5) with large texture atlases being driver issues so they supported smaller ones. They use absolutely insane amounts of VRAM for texture atlases.

Here's someone having a similar problem with texture corruption, using an AMD card:
https://forums.factorio.com/viewtopic.php?t=65466

On the otherhand, if you're using an streaming atlas [that is, swapping in new textures on the fly and not just using a constant tile map], well, confirm that it's in fact working!

Here's another user that had "the common problem with AMD GPUs and windows 10"

https://www.reddit.com/r/factorio/comments/3qx62q/getting_strange_texture_glitches_any_suggestions/

How "consistently" is this problem having? Because if it's rare, it will be difficult to find it! It's not like a crash where you've got a stacktrace to go on.

If it's an Allegro 5 or user code bug, it's going to be hard as heck to find without some code that reliably triggers the bug.

-----sig:
“Programs should be written for people to read, and only incidentally for machines to execute.” - Structure and Interpretation of Computer Programs
"Political Correctness is fascism disguised as manners" --George Carlin

Space cpp
Member #16,322
May 2016

I sent that person a message asking for his system specs, let's wait.

About the streamer, this is what I found on his profile page:
Streaming PC:
AMD Ryzen 9 3900X 12-core
Asus TUF Gaming GeForce GTX 1650 Super Overclocked 4GB

I don't do texture streaming. All atlases are generated when the game loads, unless the user manually loads a mod then the atlas is destroyed and recreated.

How "consistently" is this problem having? Because if it's rare, it will be difficult to find it! It's not like a crash where you've got a stacktrace to go on.

This is the second occurrence, from my knowledge.

So, I guess I'm really having the same problem the Factorio guys had. And thinking about it 8192x4096 isn't really that large considering modern machines reports max bitmap size as 16384.
Could it be a limitation on Direct3D9?

----------
My games

Edgar Reynaldo
Major Reynaldo
May 2007
avatar

Chris Katko
Member #1,881
January 2002
avatar

Supposedly with Vista and later, this is no longer the case:

https://docs.microsoft.com/en-us/windows/win32/direct3d9/dx9lh?redirectedfrom=MSDN#texture-creation-in-system-memory

Quote:

Device Behavior Changes
Devices are now only lost under two circumstances; when the hardware is reset because it is hanging, and when the device driver is stopped. When hardware hangs, the device can be reset by calling ResetEx. If hardware hangs, texture memory is lost.

[...]

Now with DirectX for Windows Vista, calling Reset after a mode change does not cause texture memory surfaces, textures and state information to be lost and these resources do not need to be recreated.

Not sure if additional changes have to be made to enable it, but I would imagine it's not. I'm not sure if Allegro, or that Allegro event handles calling D3D's Reset function though. Also why does it mention ResetEx... and Reset. But only ResetEx has a link. It's possible Allegro (or the calling app during event context lost has to call ResetEx instead of Reset)?

https://docs.microsoft.com/en-us/windows/win32/api/d3d9/nf-d3d9-idirect3ddevice9-reset

https://docs.microsoft.com/en-us/windows/win32/api/d3d9/nf-d3d9-idirect3ddevice9ex-resetex

A quick github has no actual results for calling Reset or ResetEx. but there is there is:

d3d_reset_state(ALLEGRO_DISPLAY_D3D *disp)
al_set_d3d_device_release_callback()
al_set_d3d_device_restore_callback()

[edit] HEY, found it. Githubs search sucks sometimes.

As long as ALLEGRO_CFG_D3D9EX is defined, which appears enabled if CMAKE is set to WANT_D3D9EX:

https://github.com/liballeg/allegro5/blob/4dff2ed93c5d56984086b834dcd9388a01ece7d3/src/win/d3d_disp.cpp#L1031

https://github.com/liballeg/allegro5/blob/2f39d7ff457c66818ebcfb5e49be33146c93aa68/CMakeLists.txt#L864

Which appears to be... off by default?!

option(WANT_D3D9EX "Enable Direct3D 9Ex extensions (Vista)" off)

-----sig:
“Programs should be written for people to read, and only incidentally for machines to execute.” - Structure and Interpretation of Computer Programs
"Political Correctness is fascism disguised as manners" --George Carlin

Space cpp
Member #16,322
May 2016

If a display lost causes ALL bitmaps to be lost then it's not the case, since everything else keeps rendering normally.

----------
My games

Chris Katko
Member #1,881
January 2002
avatar

Yeah sorry, that's a separate issue than the one I just commented about.

-----sig:
“Programs should be written for people to read, and only incidentally for machines to execute.” - Structure and Interpretation of Computer Programs
"Political Correctness is fascism disguised as manners" --George Carlin

Edgar Reynaldo
Major Reynaldo
May 2007
avatar

Space cpp
Member #16,322
May 2016

Alright, ALLEGRO_EVENT_DISPLAY_LOST events should now show a native message box. Time for testing.
OpenGL will be the default render until I find a definitive solution.
I'm considering creating an option to split the atlas in two.

----------
My games

Chris Katko
Member #1,881
January 2002
avatar

Atlas can mean many different configurations. Not all with the same drawbacks to this bug.

- Are you algorithmically building an atlas from many images? Tweak the algorithm to produce a few smaller ones. This is always a smart idea because not everyone playing your game will have modern hardware that can run that size. (Though they really amped up the max size in the last ten years thanks to megatextures being a fad.)

- Do you have just one gigantic image file that you put all your sprites on and you just load it in? Is it just a gigantic tile map, or actually tons of sprites packed in?

There may be non-Allegro related posts through Google that you can find with large texture sizes randomly corrupting in Direct3D. As always, it's a heck of a hard problem space to clamp down on unless you can find a piece of code that triggers the problem reliably--especially since it's not crashing.

If the entire atlas is affected by corruption, one thing you could do is constantly monitor the atlas in a debug copy (just a few pixels) and if they magically change, immediately drop a coredump. You might see a trend of something that was going on in your program, common across multiple dumps. Also tie that in with a "sanity check" (checking those pixels make sense) and if not, immediately free the memory and create a new copy of the atlas so the user can keep playing with only a momentary glitch. Make sure the user also either automatically or is notified to upload those core dumps so you can examine them.

-----sig:
“Programs should be written for people to read, and only incidentally for machines to execute.” - Structure and Interpretation of Computer Programs
"Political Correctness is fascism disguised as manners" --George Carlin

Space cpp
Member #16,322
May 2016

The atlas is generated at runtime after all unit/building sprites have been loaded. The sprites are sorted by size and then drawn to the large bitmap, followed by converting each bitmap into a sub-bitmap for easy usage.

By looking to my exported atlas is appears that I can't fit everything into a 4096x4096 bitmap even if I tweaked the algorithm to use every little empty space. I might need to use 2 atlases.

Checking the atlas for corruption seems a nice idea. It just worries me if constantly doing this process isn't going to hurt the game performance.

----------
My games

Chris Katko
Member #1,881
January 2002
avatar

Reading a few bytes (since we assume the entire thing is going to corrupt) shouldn't be much performance hit. It'll be immediately apparent whether it hurts your frames and is worth it or not. But a few bytes over the bus should be nothing compared to the back-and-forth of a typical frame.

Space cpp said:

I might need to use 2 atlases.

That's really not a big deal unless you, absolute worst case, draw one-by-one from each atlas back and forth. And you can avoid that usually by knowing what you're drawing by category. Clouds last. ground first. then enemies and heros. then particles. etc. So you stuff the atlas with related stuff based on expected draw order.

A few dozen context switches is nothing. Thousands about thousands is something.

-----sig:
“Programs should be written for people to read, and only incidentally for machines to execute.” - Structure and Interpretation of Computer Programs
"Political Correctness is fascism disguised as manners" --George Carlin

Edgar Reynaldo
Major Reynaldo
May 2007
avatar

Bump for anti-lock.

Have you discovered the cause of the atlas corruption? Were there any ALLEGRO_EVENT_DISPLAY_LOST events fired? Did you detect bitmap corruption at any specific points in time? Did you try rebuilding allegro with D3D9_EX support?

Space cpp
Member #16,322
May 2016

Not yet.
No ALLEGRO_EVENT_DISPLAY_LOST events until now.
I spent the last days working on the netcode. I'm gonna see this issue again tomorrow.

Edit1: I implemented a function to test a bitmap for corruption, made it test every bitmap before and after the atlas creation.
I sent the executable to that same person, I'm waiting for his reply.

----------
My games

Edgar Reynaldo
Major Reynaldo
May 2007
avatar

Space cpp
Member #16,322
May 2016

This is unfortunate, it seems his machine can't trigger the bug anymore.
I'm already looking for someone else with the same problem to help me.

By the way, I have searched a lot on old Factorio posts and it seems the root of the problem is Direct3D9 being too old for modern machines.
So I guess I'm destined to rely on workarounds, or abandon D3D9 completely.

----------
My games

Chris Katko
Member #1,881
January 2002
avatar

Space cpp said:

it seems the root of the problem is Direct3D9 being too old for modern machines.

That sounds like driver issues then. Vendors not testing D9D because it's for older games.

-----sig:
“Programs should be written for people to read, and only incidentally for machines to execute.” - Structure and Interpretation of Computer Programs
"Political Correctness is fascism disguised as manners" --George Carlin

Edgar Reynaldo
Major Reynaldo
May 2007
avatar

Go to: