Alright, I did some benchmarks on the pixel operations:
Much to my surprise, the al_put_pixel() and al_get_pixel() functions are quite a bit faster than I imagined they would be. I had thought writing a direct memory write function on a locked region would be faster.
Basically, under no circumstances would you want to use them without first locking your bitmap, as you can see from the chart. As remarkably fast as the functions are when locked, they're equally remarkably slow when not. al_draw_pixel()is reasonable as long as you don't want to draw a lot of pixels.
The locking function itself can be kind of slow, however. On my system, I could probably lock about 30 320x240 bitmaps in their native pixel format and that alone would affect the 60fps frame rate without anything else happening.
Also, there were no noticeable differences between ALLEGRO_LOCK_READONLY, ALLEGRO_LOCK_WRITEONLY, and ALLEGRO_LOCK_READWRITE.
write_pixel_argb_8888() is as follows:
inline void write_pixel_argb_8888(ALLEGRO_LOCKED_REGION *region, int x, int y, ALLEGRO_COLOR &col)
unsigned char r, g, b, a;
al_unmap_rgba(col, &r, &g, &b, &a);
ptr32 = (uint32_t *)region->data + x + y*(region->pitch/4);
*ptr32 = (a << 24) | (r << 16) | (g << 8) | b;