I switched to memory bitmaps. It is faster but still way too slow.
All I'm doing, is a single pass of two for-loops to convert a bitmap into an array.
Back in the A4 days I'd use al_get_pixel for collision detection with pixel maps. But in A5 it's way too slow sitting on the videocard, so you'd use system bitmaps. But the system bitmaps aren't as fast as they should be. Checking for collisions by using al_get_pixel on a memory bitmap in A5 is a noticable burden on my profiling. So I convert the memory bitmap to an array. It works way faster... except the conversion phase which now gives my prototype game a huge startup time:
- Normal/video bitmaps: 70 seconds
- System bitmap: 10 seconds (definitely better!)
Source bitmap (PNG) is 10000x1500
10 seconds to do 10000x1500 = 15,000,000 reads. 1.5 million a second sounds fast but we're talking simple reads.
Maybe I'm thinking too hard and it's "fast" for what I'm doing and I'm just doing something "silly". My computer is a slower platform, a Celeron Chromebook. But I can run my game at 115+ FPS with 1200+ blended clouds being rendered just fine. But now, it takes 10+ seconds to boot it just to convert the bitmap to an array.
Removing the only non-allegro reference, map_data, has no effect on time.
I'm using Linux, Ubuntu ~14.04, Allegro ~5.2.2 (recently compiled from git). 64-bit OS.
[edit] Also, just because I love inxi, here's the output:
Surround your two loops with al_lock_bitmap and al_unlock_bitmap. This is actually mentioned in the docs
.
But why do memory bitmaps need locked? They're already in memory...
[edit]
Okay, with locking, it's 8 seconds.
Still... :/
al_lock_bitmap(map_mbmp, g.ALLEGRO_PIXEL_FORMAT_ANY, ALLEGRO_LOCK_READONLY);
Oh, didn't notice it was already a memory bitmap, interesting that it helps! There is still quite a bit of conversion that happens even in this case and it's not really inlinable. You might get more speed by locking the bitmap and then going through the memory representations directly.
I've added a more detailed timekeeping mechanism instead of using /usr/bin/time -v ./my_program and queuing up some KEY_ESCAPE events while it loads which adds some error/variance.
I'd be fine with direct bitmap access. Are there any A5 examples that expose this functionality? I had the impression that A5 had a more "hands off"/"don't touch internals" approach.
I load a bitmap that represents a per-pixel textured world, so I draw it to the screen. But I also check pixels for collisions between objects/particles and that terrain.
I have no problem having two separate data structures (one texture, one array for collision)--though I may run into issues if I start allowing deformable terrain being slow.
My only issue right now, however, is that the conversion is really slow. It takes more time to load a PNG and uncompress it, than it does to read every pixel! That can't be right! 
Oh, I almost forgot. I am running profile mode cmake'd Allegro 5, so I don't know if Allegro 5 profile is also full-debug / no optimization, so it's possible running -release will be much faster and this is due to only tons of additional debug error checking.
Thanks for the help! Hope your having a great Holiday/Christmas.
[edit] Also, while I've got your ear. This is "off-topic" but still a bug AFAIK. It seems that many allegro flags don't get exposed in DAllegro. So I have to look them up with grep in Allegro 5's source code, find the flag, and then hardcode it into my D program. ALLEGRO_MEMORY_BITMAP works, but ALLEGRO_PIXEL_FORMAT_ANY and ALLEGRO_VSYNC, I definitely had to add.
[edit]
I think I tracked down the relevant code to /include/allegro5/internal/aintern_pixel.h
Perhaps as long as I ensure the bitmap format, I can just use the relevant case in here. No wonder getpixel is so slow! There's branches and branches and cases and cases! Making sure it's locked, and if not, what to do. Making sure it's the right format. Whether it's a sub-bitmap or not. Clipping. Goodness!
[edit]
It's not allegro at all!!!
DMD's profiling switch is EXPLODING the call time. Timing just the bitmap -> array function, without --profile it takes 1.03 seconds! With --profile, it takes 6 seconds. --profile-gc (garbage collections) doesn't affect it noticably, just --profile.
It's possible because al_get_pixel is doing tons of really short functions, the tracing functions themselves become a huge overhead. I'm going to do a follow up on dlang.org forums. I'm also going to test LDC's profiling which (AFAIK) uses completely different profiling functions/instrumentation.
One second to process 15 MB of data with all those extra clipping/locking/pixel-format checks on a humble Celeron netbook is not surprising or unreasonable. I'm probably going to move forward with a specific internal al_get_pixel function or code snippet, as well as follow up with the profiling.
The clue came to me when I was running Valgrind tests on it. I kept getting functions called "trace" taking large amounts of time and they were embedded in allegro functions/etc. And then the dumb revelation finally dawned on me. "DUH! I had --profile on!" I've been coding with it on and never had any problem with it. But I think maybe this specific use case explodes the overhead of whatever criterea/method/algorithm they're using for tracing in DMD.