Maybe you guys can help me wrap my head around this, probably in a way that doesn't kill the CPU every frame.
Say we have a picture of the world map, let's call it Background Map:
{"name":"603082","src":"\/\/djungxnpq2nug.cloudfront.net\/image\/cache\/b\/e\/bea3341c2502b6ab6debea19e0c85c1d.png","w":800,"h":300,"tn":"\/\/djungxnpq2nug.cloudfront.net\/image\/cache\/b\/e\/bea3341c2502b6ab6debea19e0c85c1d"}
And we have a screenshot of the game:
The screenshot needs to be aligned to the corresponding Background Map, with a 100% pixel precision. I'm currently able to obtain the coordinates of the camera, but it still doesn't map very well. Here's an example result:
{"name":"603084","src":"\/\/djungxnpq2nug.cloudfront.net\/image\/cache\/3\/b\/3beba7f86931453a25a8416916b6ac5d.png","w":800,"h":300,"tn":"\/\/djungxnpq2nug.cloudfront.net\/image\/cache\/3\/b\/3beba7f86931453a25a8416916b6ac5d"}
The thing is, these alignment problems never go beyond 10-20 pixels or the like. I don't have any control over this, all I need to do is find a way to align the image back to where it should be. So the marked red area means what I would use to search for the possible correct position.
The end result should be something like this, mapping the screenshot to the correct coordinate.
{"name":"603083","src":"\/\/djungxnpq2nug.cloudfront.net\/image\/cache\/4\/e\/4e35266b9712bbcd211361f405f12eb5.png","w":800,"h":300,"tn":"\/\/djungxnpq2nug.cloudfront.net\/image\/cache\/4\/e\/4e35266b9712bbcd211361f405f12eb5"}
What I can't wrap my head around so far is how could I try to align this? Keep in mind the screenshot's contents could obviously be different from the world map(in this case, there's the Mario sprite there).
I thought about searching from the outer edges and comparing in the search area, but I'm not really sure if that would work for every area. At least I can reduce the search area drastically by using the camera values. Got any smart ideas on how to do this?
I should clarify that performance isn't very critical, but it shouldn't take 2 seconds to do it.
Edge detection is simple to do:
This requires an exact match, down to the RGB values for every edge pixel. Works in this case and is fast enough. To make this more flexible you could replace the direct comparisons with a fuzzy match and allow for x% of failures.
The final match is shown in this case. You could break ties based on strength of match.
I think the above code could be adapted to do a complete square match if you bailed out as soon as there were enough failures (as opposed to always calculating the entire match for every pixel location). Obviously if you were to do a more intelligent approach where you only focused on probable areas based on a general color / pattern match, you could speed things up.
If you always know within 20 pixels or so of where it should go, then I'd just rate each of those 20 locations based on a complete analysis and pick the best.
Thanks for the answer. I have a more clear idea now.
Still, it seems I need to account for something else now I didn't notice before. The output screenshot from the emulator doesn't have the exact same colors as in the ripped map I got from vgmaps.com. And there isn't a noticeable pattern in the RGB differences for each pixel sadly(like in, -2 in each RGB channel). So I guess I'll have to make direct comparisons per pixel and search if it's in the same range. The differences aren't bigger than 7 in a color channel from what I've noticed.
Your solution did look sexy though.
So I guess I'll start searching from the edges, to the inside. But instead of doing a full square check, I could skip several rows/columns to make it faster, like this:
{"name":"603089","src":"\/\/djungxnpq2nug.cloudfront.net\/image\/cache\/8\/4\/8487596f5b85214bbf93803f95af707b.png","w":256,"h":224,"tn":"\/\/djungxnpq2nug.cloudfront.net\/image\/cache\/8\/4\/8487596f5b85214bbf93803f95af707b"}
Like you said, I'll need to do a fuzzy check, since it's very likely something else will be in the edge of the screen, like an enemy or an animated object. Probably leave just the best 10 comparisons, and do more precise checks from there, until only 1 of them is the "best".
The user would be able to set the precision of this in a GUI, including a full square check. It's just so it doesn't take an insane amount of time.
Just a question, would it be much faster if I stored the RGB values of the pixels I want to compare(the background search area, and the screenshot) in an array of my own, instead of requesting them with al_get_pixel in every check? I'll be using MEMORY_BITMAPs if it helps, since I'm dealing with very large images for the world map.
EDIT: Or I suppose reading the data from the lock like you showed me is enough? I don't know if I should get out the rgb values for doing the check, or is there some other way to detect if the values are similar?
For speed, you should lock the bitmaps as I was doing.
You can get the RGB values by using the uint32_t cast, and doing normal shift operations to get at the values.
Regarding the fuzzy match, I would simply do something like a sum of squared errors for each pixel that you test. Keep track of the best one, and use that. If you know there's a match, then it should always give you the best result.
So something like:
best_sse = infinity foreach pixel: sse += (R1 - R2)^2 + (G1 - G2)^2 + (B1 - B2)^2 if (sse > best_sse) goto next best_sse = sse next: move to next spot on the map, restart test
So basically, as soon as the SSE is larger than the best known one, stop testing that area, and move on to the next. I'm guessing that this is adequate for what you need. Using HSV instead of RGB might give better results, but I'd start simple and see how it goes.
Woot! This seems to be working.
For the sake of knowing what I was writing, I didn't do any optimizations by reading the data directly using the locks. It's not really fast, but I can put more precision using the search_skip int. The current precision is 5, so it'll search in rects like my image above separated by 5 pixels between each other.
And these are the images I used. In the end result, it mapped with a 100% pixel precision, even if the colors are different!
The background bitmap(it's a section of the world map, which is near the camera values).
{"name":"603090","src":"\/\/djungxnpq2nug.cloudfront.net\/image\/cache\/9\/0\/9007984332707dccd0e454871af45243.png","w":300,"h":260,"tn":"\/\/djungxnpq2nug.cloudfront.net\/image\/cache\/9\/0\/9007984332707dccd0e454871af45243"}
And the screenshot picture:
{"name":"603091","src":"\/\/djungxnpq2nug.cloudfront.net\/image\/cache\/3\/9\/3986d4616e9aa63022c489ffb112b45e.png","w":256,"h":224,"tn":"\/\/djungxnpq2nug.cloudfront.net\/image\/cache\/3\/9\/3986d4616e9aa63022c489ffb112b45e"}
So now off to optimizing it I guess...
EDIT: Here's my optimized version:
It takes a lot less time than the last one, but I'm not sure if it's still fast enough... Increasing the search_skip makes it faster, but it might be less accurate. Putting it at an insane value like 900 would make it only search on the outer edges. It worked well so far in all cases. I'm not sure how it will perform in real time with enemies coming, but we'll see.
Is there anything else that can be optimized there?
You're not actually going to use this in real time are you? Mario like maps are normally made out of several parallax scrolling layers, not a single bitmap like you show above. The background is usually at least one layer (counting the bushes, at least two), probably two or more, the platforms are another layer or two, the objects like bricks would be another, and the player and NPCs are yet another.
Or at least thats how I'd do it. And they'd all be able to move at a different rate.
I did a little hack on the emulator for the parallax backgrounds.
Consider this like part of a recording method, it doesn't need to be as fast as possible to keep a good FPS, but it shouldn't be killing the CPU.
And believe it or not, the bushes and the map are one single layer actually. There's only a single parallax background.
I still don't quite get what you're using this for.
You're one curious moose aren't you?
Here's a video to show off what I wanted to do, as well as the problem:
The whole recording of this video is done on the emulator, but as you can see, there are some problems when aligning the screenshot when running and jumping. I assume the game uses something else to manage that apart from the Layer 1 X and Y position(which I've been able to track with the RAM Searcher), so my best solution to fix it would be aligning it with this.
The solution has to be as much game-independent as possible, so I can't just go around trying to figure out how SMW renders this. I'm planning on doing these videos with other games. Yeah, the recording does take a while, especially if you want to do a 1080p video, but time ain't the issue here. It has to be as automatic as possible. Still, optimizations are always welcome.
Anyway, I guess this means that there are no more Allegro-related optimizations to do?
EDIT: And before you ask, it's done in Allegro and not on the native emulator's routines because I wanted to port it to other emulators as well.
<Usefulness of the project not to be discussed>
Ah. Interesting. I'm not sure how else to improve it, other than looking for some "computer vision" algorithms to more accurately detect the right area.
Awesome, it's working very nice!
Only by searching the edges everything was crap. But when using a precision of 10(rectangles separated by 10 pixels), the precision is amazing! I'd have to try out different values, since it does add a bit of overhead to each frame of recording, and look what would be the best balanced option.
Thanks for the help! Here's a short video of how it looks when aligned properly:
EDIT: How the hell does Youtube recognize it's a Mario related video??! I didn't even add those tags.
Now try getting a little fancier and pull out the koopas and other things to remove them when they shouldn't be displayed
Well, that was a planned idea.
The problem is the maps from vgmaps.com have the objects already there. I could try contacting the author to remove that layer(surely they use some Gimp/Photoshop file). Or I would have to figure out a way to map the whole map using the emulator's layers. But then you'd only see what the player travelled, or I'd have to walk around the whole map.
Or maybe just blit the 2 leftmost columns onto the old world map? So many ways...
Use some fancy algorithm to detect objects like koopas, and see what should be there, and replace them with that.
If only it were as easy as it sounds...
Yeah, it isn't unless you're good with math, and can easily pick up Computer Vision topics.
What you have now is pretty close, at least it looks a lot less wrong than before.
Is there anything else that can be optimized there?
Without substantially changing the method, not really. With this brute force method, it's basically about lowering the amount of pixels to test. i.e., You want to test the fewest pixels as necessary to get an accurate result.
I'd try to avoid the vertical checks if possible, as those will be slower. It would be faster to do 20 horizontal checks than 10 of each. I only added them because I was doing an exact edge check.
You could break out of the loop slightly faster. As soon as sse > best_sse, you could skip to the next test, instead of waiting until the end. However, if you are in the general location, it's possible the values are so close, that would only prevent the last few pixels from being scanned.
It would actually be more trivial to match a complex image, because all you would have to do is align maybe two edges. But with 8 or 16 bit video games, you often have large blocks of colors, so you get a lot of area that matches.