Allegro.cc - Online Community

Allegro.cc Forums » Programming Questions » [A5] Reading from bitmaps is amazingly slow

This thread is locked; no one can reply to it. rss feed Print
[A5] Reading from bitmaps is amazingly slow
Chris Katko
Member #1,881
January 2002
avatar

I switched to memory bitmaps. It is faster but still way too slow.

All I'm doing, is a single pass of two for-loops to convert a bitmap into an array.

Back in the A4 days I'd use al_get_pixel for collision detection with pixel maps. But in A5 it's way too slow sitting on the videocard, so you'd use system bitmaps. But the system bitmaps aren't as fast as they should be. Checking for collisions by using al_get_pixel on a memory bitmap in A5 is a noticable burden on my profiling. So I convert the memory bitmap to an array. It works way faster... except the conversion phase which now gives my prototype game a huge startup time:

- Normal/video bitmaps: 70 seconds
- System bitmap: 10 seconds (definitely better!)

Source bitmap (PNG) is 10000x1500

10 seconds to do 10000x1500 = 15,000,000 reads. 1.5 million a second sounds fast but we're talking simple reads.

Maybe I'm thinking too hard and it's "fast" for what I'm doing and I'm just doing something "silly". My computer is a slower platform, a Celeron Chromebook. But I can run my game at 115+ FPS with 1200+ blended clouds being rendered just fine. But now, it takes 10+ seconds to boot it just to convert the bitmap to an array.

#SelectExpand
1void parse_map_bitmap_into_array() 2 { 3 ALLEGRO_COLOR c; 4 printf("Converting world bitmap [%i, %i] to array. This will take a moment.\n", map_mbmp.w, map_mbmp.h); 5 for(int i = 0; i < map_mbmp.w; i++) 6 { 7 if(i % 1000 == 0)printf("%i\n", i); 8 9 for(int j = 0; j < map_mbmp.h; j++) 10 { 11 c = al_get_pixel(map_mbmp, i, j); 12 if(c.r + c.g + c.b > .1) 13 { 14 map_data[i][j] = true; 15 } 16 } 17 } 18 }

Removing the only non-allegro reference, map_data, has no effect on time.

I'm using Linux, Ubuntu ~14.04, Allegro ~5.2.2 (recently compiled from git). 64-bit OS.

[edit] Also, just because I love inxi, here's the output:

#SelectExpand
1inxi -Fx 2System: Host: saturn Kernel: 4.10.0-041000-generic x86_64 (64 bit gcc: 6.2.0) 3 Desktop: Unity 7.4.5 (Gtk 3.18.9) Distro: Ubuntu 16.04 xenial 4Machine: System: GOOGLE (portable) product: Peppy v: 1.0 5 Mobo: GOOGLE model: Peppy v: 1.0 Bios: coreboot v: 4.0-6588-g4acd8ea-dirty date: 09/04/2014 6CPU: Dual core Intel Celeron 2955U (-MCP-) cache: 2048 KB 7 flags: (lm nx sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx) bmips: 5587 8 clock speeds: max: 1400 MHz 1: 799 MHz 2: 799 MHz 9Graphics: Card: Intel Haswell-ULT Integrated Graphics Controller bus-ID: 00:02.0 10 Display Server: X.Org 1.18.4 drivers: intel (unloaded: fbdev,vesa) Resolution: 1366x768@60.00hz 11 GLX Renderer: Mesa DRI Intel Haswell Mobile GLX Version: 3.0 Mesa 17.2.4 Direct Rendering: Yes 12Audio: Card-1 Intel 8 Series HD Audio Controller driver: snd_hda_intel bus-ID: 00:1b.0 13 Card-2 Intel Haswell-ULT HD Audio Controller driver: snd_hda_intel bus-ID: 00:03.0 14 Sound: Advanced Linux Sound Architecture v: k4.10.0-041000-generic 15Network: Card: Qualcomm Atheros AR9462 Wireless Network Adapter driver: ath9k bus-ID: 01:00.0 16 IF: wlan0 state: up mac: 90:48:9a:75:d2:f5 17Drives: HDD Total Size: 63.9GB (53.2% used) ID-1: /dev/sda model: KINGSTON_SNS4151 size: 32.0GB 18 ID-2: USB /dev/sdb model: Power_Saving_USB size: 31.9GB 19Partition: ID-1: / size: 28G used: 23G (86%) fs: ext4 dev: /dev/sda1 20 ID-2: swap-1 size: 2.08GB used: 0.53GB (25%) fs: swap dev: /dev/sda5 21RAID: No RAID devices: /proc/mdstat, md_mod kernel module present 22Sensors: System Temperatures: cpu: 40.8C mobo: N/A 23 Fan Speeds (in rpm): cpu: N/A 24Info: Processes: 255 Uptime: 14:03 Memory: 1363.1/1872.9MB Init: systemd runlevel: 5 Gcc sys: 5.4.0 25 Client: Shell (bash 4.3.481) inxi: 2.2.35

-----sig:
“Programs should be written for people to read, and only incidentally for machines to execute.” - Structure and Interpretation of Computer Programs

SiegeLord
Member #7,827
October 2006
avatar

Surround your two loops with al_lock_bitmap and al_unlock_bitmap. This is actually mentioned in the docs :).

"For in much wisdom is much grief: and he that increases knowledge increases sorrow."-Ecclesiastes 1:18
[SiegeLord's Abode][Codes]:[DAllegro5]:[RustAllegro]

Chris Katko
Member #1,881
January 2002
avatar

But why do memory bitmaps need locked? They're already in memory...

[edit]

Okay, with locking, it's 8 seconds.

Still... :/

al_lock_bitmap(map_mbmp, g.ALLEGRO_PIXEL_FORMAT_ANY, ALLEGRO_LOCK_READONLY);

-----sig:
“Programs should be written for people to read, and only incidentally for machines to execute.” - Structure and Interpretation of Computer Programs

SiegeLord
Member #7,827
October 2006
avatar

Oh, didn't notice it was already a memory bitmap, interesting that it helps! There is still quite a bit of conversion that happens even in this case and it's not really inlinable. You might get more speed by locking the bitmap and then going through the memory representations directly.

"For in much wisdom is much grief: and he that increases knowledge increases sorrow."-Ecclesiastes 1:18
[SiegeLord's Abode][Codes]:[DAllegro5]:[RustAllegro]

Chris Katko
Member #1,881
January 2002
avatar

I've added a more detailed timekeeping mechanism instead of using /usr/bin/time -v ./my_program and queuing up some KEY_ESCAPE events while it loads which adds some error/variance.

I'd be fine with direct bitmap access. Are there any A5 examples that expose this functionality? I had the impression that A5 had a more "hands off"/"don't touch internals" approach.

I load a bitmap that represents a per-pixel textured world, so I draw it to the screen. But I also check pixels for collisions between objects/particles and that terrain.

I have no problem having two separate data structures (one texture, one array for collision)--though I may run into issues if I start allowing deformable terrain being slow.

My only issue right now, however, is that the conversion is really slow. It takes more time to load a PNG and uncompress it, than it does to read every pixel! That can't be right! :)

Oh, I almost forgot. I am running profile mode cmake'd Allegro 5, so I don't know if Allegro 5 profile is also full-debug / no optimization, so it's possible running -release will be much faster and this is due to only tons of additional debug error checking.

Thanks for the help! Hope your having a great Holiday/Christmas.

[edit] Also, while I've got your ear. This is "off-topic" but still a bug AFAIK. It seems that many allegro flags don't get exposed in DAllegro. So I have to look them up with grep in Allegro 5's source code, find the flag, and then hardcode it into my D program. ALLEGRO_MEMORY_BITMAP works, but ALLEGRO_PIXEL_FORMAT_ANY and ALLEGRO_VSYNC, I definitely had to add.

[edit]

I think I tracked down the relevant code to /include/allegro5/internal/aintern_pixel.h

#SelectExpand
1#define _AL_INLINE_GET_PIXEL(format, data, color, advance) \ 2 do { \ 3 switch (format) { \ 4 case ALLEGRO_PIXEL_FORMAT_ARGB_8888: { \ 5 uint32_t _gp_pixel = *(uint32_t *)(data); \ 6 _AL_MAP_RGBA(color, \ 7 (_gp_pixel & 0x00FF0000) >> 16, \ 8 (_gp_pixel & 0x0000FF00) >> 8, \ 9 (_gp_pixel & 0x000000FF) >> 0, \ 10 (_gp_pixel & 0xFF000000) >> 24); \ 11 if (advance) \ 12 data += 4; \ 13 break; \ 14 } \ 15 \ 16 case ALLEGRO_PIXEL_FORMAT_RGBA_8888: { \ 17 uint32_t _gp_pixel = *(uint32_t *)(data); \ 18 _AL_MAP_RGBA(color, \ 19 (_gp_pixel & 0xFF000000) >> 24, \ 20 (_gp_pixel & 0x00FF0000) >> 16, \ 21 (_gp_pixel & 0x0000FF00) >> 8, \ 22 (_gp_pixel & 0x000000FF) >> 0); \ 23 if (advance) \ 24 data += 4; \ 25 break; \ 26 } \ 27 \ 28 case ALLEGRO_PIXEL_FORMAT_ARGB_4444: { \ 29 uint16_t _gp_pixel = *(uint16_t *)(data); \ 30 _AL_MAP_RGBA(color, \ 31 _al_rgb_scale_4[(_gp_pixel & 0x0F00) >> 8], \ 32 _al_rgb_scale_4[(_gp_pixel & 0x00F0) >> 4], \ 33 _al_rgb_scale_4[(_gp_pixel & 0x000F)], \ 34 _al_rgb_scale_4[(_gp_pixel & 0xF000) >> 12]); \ 35 if (advance) \ 36 data += 2; \ 37 break; \ 38 } \ 39 \ 40 case ALLEGRO_PIXEL_FORMAT_RGB_888: { \ 41 uint32_t _gp_pixel = READ3BYTES(data); \ 42 _AL_MAP_RGBA(color, \ 43 (_gp_pixel & 0xFF0000) >> 16, \ 44 (_gp_pixel & 0x00FF00) >> 8, \ 45 (_gp_pixel & 0x0000FF) >> 0, \ 46 255); \ 47 if (advance) \ 48 data += 3; \ 49 break; \ 50 } \ 51 \ 52 case ALLEGRO_PIXEL_FORMAT_RGB_565: { \ 53 uint16_t _gp_pixel = *(uint16_t *)(data); \ 54 _AL_MAP_RGBA(color, \ 55 _al_rgb_scale_5[(_gp_pixel & 0xF800) >> 11], \ 56 _al_rgb_scale_6[(_gp_pixel & 0x07E0) >> 5], \ 57 _al_rgb_scale_5[(_gp_pixel & 0x001F)], \ 58 255); \ 59 if (advance) \ 60 data += 2; \ 61 break; \ 62 } \ 63 \ 64 case ALLEGRO_PIXEL_FORMAT_RGB_555: { \ 65 uint16_t _gp_pixel = *(uint16_t *)(data); \ 66 _AL_MAP_RGBA(color, \ 67 _al_rgb_scale_5[(_gp_pixel & 0x7C00) >> 10], \ 68 _al_rgb_scale_5[(_gp_pixel & 0x03E0) >> 5], \ 69 _al_rgb_scale_5[(_gp_pixel & 0x001F)], \ 70 255); \ 71 if (advance) \ 72 data += 2; \ 73 break; \ 74 } \ 75 \ 76 case ALLEGRO_PIXEL_FORMAT_RGBA_5551: { \ 77 uint16_t _gp_pixel = *(uint16_t *)(data); \ 78 _AL_MAP_RGBA(color, \ 79 _al_rgb_scale_5[(_gp_pixel & 0xF800) >> 11], \ 80 _al_rgb_scale_5[(_gp_pixel & 0x07C0) >> 6], \ 81 _al_rgb_scale_5[(_gp_pixel & 0x003E) >> 1], \ 82 _al_rgb_scale_1[_gp_pixel & 1]); \ 83 if (advance) \ 84 data += 2; \ 85 break; \ 86 } \ 87 \ 88 case ALLEGRO_PIXEL_FORMAT_ARGB_1555: { \ 89 uint16_t _gp_pixel = *(uint16_t *)(data); \ 90 _AL_MAP_RGBA(color, \ 91 _al_rgb_scale_5[(_gp_pixel & 0x7C00) >> 10], \ 92 _al_rgb_scale_5[(_gp_pixel & 0x03E0) >> 5], \ 93 _al_rgb_scale_5[(_gp_pixel & 0x001F)], \ 94 _al_rgb_scale_1[(_gp_pixel & 0x8000) >> 15]); \ 95 if (advance) \ 96 data += 2; \ 97 break; \ 98 } \ 99 \ 100 case ALLEGRO_PIXEL_FORMAT_ABGR_8888: { \ 101 uint32_t _gp_pixel = *(uint32_t *)(data); \ 102 _AL_MAP_RGBA(color, \ 103 (_gp_pixel & 0x000000FF) >> 0, \ 104 (_gp_pixel & 0x0000FF00) >> 8, \ 105 (_gp_pixel & 0x00FF0000) >> 16, \ 106 (_gp_pixel & 0xFF000000) >> 24); \ 107 if (advance) \ 108 data += 4; \ 109 break; \ 110 } \ 111 \ 112 case ALLEGRO_PIXEL_FORMAT_XBGR_8888: { \ 113 uint32_t _gp_pixel = *(uint32_t *)(data); \ 114 _AL_MAP_RGBA(color, \ 115 (_gp_pixel & 0x000000FF) >> 0, \ 116 (_gp_pixel & 0x0000FF00) >> 8, \ 117 (_gp_pixel & 0x00FF0000) >> 16, \ 118 255); \ 119 if (advance) \ 120 data += 4; \ 121 break; \ 122 } \ 123 \ 124 case ALLEGRO_PIXEL_FORMAT_BGR_888: { \ 125 uint32_t _gp_pixel = READ3BYTES(data); \ 126 _AL_MAP_RGBA(color, \ 127 (_gp_pixel & 0x000000FF) >> 0, \ 128 (_gp_pixel & 0x0000FF00) >> 8, \ 129 (_gp_pixel & 0x00FF0000) >> 16, \ 130 255); \ 131 if (advance) \ 132 data += 4; \ 133 break; \ 134 } \ 135 \ 136 case ALLEGRO_PIXEL_FORMAT_BGR_565: { \ 137 uint16_t _gp_pixel = *(uint16_t *)(data); \ 138 _AL_MAP_RGBA(color, \ 139 _al_rgb_scale_5[(_gp_pixel & 0x001F)], \ 140 _al_rgb_scale_6[(_gp_pixel & 0x07E0) >> 5], \ 141 _al_rgb_scale_5[(_gp_pixel & 0xF800) >> 11], \ 142 255); \ 143 if (advance) \ 144 data += 2; \ 145 break; \ 146 } \ 147 \ 148 case ALLEGRO_PIXEL_FORMAT_BGR_555: { \ 149 uint16_t _gp_pixel = *(uint16_t *)(data); \ 150 _AL_MAP_RGBA(color, \ 151 _al_rgb_scale_5[(_gp_pixel & 0x001F)], \ 152 _al_rgb_scale_5[(_gp_pixel & 0x03E0) >> 5], \ 153 _al_rgb_scale_5[(_gp_pixel & 0x7C00) >> 10], \ 154 255); \ 155 if (advance) \ 156 data += 2; \ 157 break; \ 158 } \ 159 \ 160 case ALLEGRO_PIXEL_FORMAT_RGBX_8888: { \ 161 uint32_t _gp_pixel = *(uint32_t *)(data); \ 162 _AL_MAP_RGBA(color, \ 163 (_gp_pixel & 0xFF000000) >> 24, \ 164 (_gp_pixel & 0x00FF0000) >> 16, \ 165 (_gp_pixel & 0x0000FF00) >> 8, \ 166 255); \ 167 if (advance) \ 168 data += 4; \ 169 break; \ 170 } \ 171 \ 172 case ALLEGRO_PIXEL_FORMAT_XRGB_8888: { \ 173 uint32_t _gp_pixel = *(uint32_t *)(data); \ 174 _AL_MAP_RGBA(color, \ 175 (_gp_pixel & 0x00FF0000) >> 16, \ 176 (_gp_pixel & 0x0000FF00) >> 8, \ 177 (_gp_pixel & 0x000000FF), \ 178 255); \ 179 if (advance) \ 180 data += 4; \ 181 break; \ 182 } \ 183 \ 184 case ALLEGRO_PIXEL_FORMAT_ABGR_F32: { \ 185 float *f = (float *)data; \ 186 color.r = f[0]; \ 187 color.g = f[1]; \ 188 color.b = f[2]; \ 189 color.a = f[3]; \ 190 if (advance) \ 191 data += 4 * sizeof(float); \ 192 break; \ 193 } \ 194 \ 195 case ALLEGRO_PIXEL_FORMAT_ABGR_8888_LE: { \ 196 uint8_t *p = (uint8_t *)data; \ 197 _AL_MAP_RGBA(color, *p, *(p + 1), *(p + 2), *(p + 3)); \ 198 if (advance) \ 199 data += 4; \ 200 break; \ 201 } \ 202 \ 203 case ALLEGRO_PIXEL_FORMAT_RGBA_4444: { \ 204 uint16_t _gp_pixel = *(uint16_t *)(data); \ 205 _AL_MAP_RGBA(color, \ 206 _al_rgb_scale_4[(_gp_pixel & 0xF000) >> 12], \ 207 _al_rgb_scale_4[(_gp_pixel & 0x0F00) >> 8], \ 208 _al_rgb_scale_4[(_gp_pixel & 0x00F0) >> 4], \ 209 _al_rgb_scale_4[(_gp_pixel & 0x000F)]); \ 210 if (advance) \ 211 data += 2; \ 212 break; \ 213 } \ 214 \ 215 case ALLEGRO_PIXEL_FORMAT_SINGLE_CHANNEL_8: { \ 216 uint8_t c = *(uint8_t *)(data); \ 217 _AL_MAP_RGBA(color, c, c, c, 255); \ 218 if (advance) \ 219 data += 2; \ 220 break; \ 221 } \ 222 \ 223 case ALLEGRO_PIXEL_FORMAT_ANY: \ 224 case ALLEGRO_PIXEL_FORMAT_ANY_NO_ALPHA: \ 225 case ALLEGRO_PIXEL_FORMAT_ANY_WITH_ALPHA: \ 226 case ALLEGRO_PIXEL_FORMAT_ANY_15_NO_ALPHA: \ 227 case ALLEGRO_PIXEL_FORMAT_ANY_16_NO_ALPHA: \ 228 case ALLEGRO_PIXEL_FORMAT_ANY_16_WITH_ALPHA: \ 229 case ALLEGRO_PIXEL_FORMAT_ANY_24_NO_ALPHA: \ 230 case ALLEGRO_PIXEL_FORMAT_ANY_32_NO_ALPHA: \ 231 case ALLEGRO_PIXEL_FORMAT_ANY_32_WITH_ALPHA: \ 232 ALLEGRO_ERROR("INLINE_GET got fake pixel format: %d\n", format); \ 233 abort(); \ 234 break; \ 235 \ 236 case ALLEGRO_PIXEL_FORMAT_COMPRESSED_RGBA_DXT1: \ 237 case ALLEGRO_PIXEL_FORMAT_COMPRESSED_RGBA_DXT3: \ 238 case ALLEGRO_PIXEL_FORMAT_COMPRESSED_RGBA_DXT5: \ 239 ALLEGRO_ERROR("INLINE_GET got compressed format: %d\n", format); \ 240 abort(); \ 241 break; \ 242 \ 243 case ALLEGRO_NUM_PIXEL_FORMATS: \ 244 default: \ 245 ALLEGRO_ERROR("INLINE_GET got non pixel format: %d\n", format); \ 246 abort(); \ 247 break; \ 248 } \ 249 } while (0)

Perhaps as long as I ensure the bitmap format, I can just use the relevant case in here. No wonder getpixel is so slow! There's branches and branches and cases and cases! Making sure it's locked, and if not, what to do. Making sure it's the right format. Whether it's a sub-bitmap or not. Clipping. Goodness!

[edit]

It's not allegro at all!!!

DMD's profiling switch is EXPLODING the call time. Timing just the bitmap -> array function, without --profile it takes 1.03 seconds! With --profile, it takes 6 seconds. --profile-gc (garbage collections) doesn't affect it noticably, just --profile.

It's possible because al_get_pixel is doing tons of really short functions, the tracing functions themselves become a huge overhead. I'm going to do a follow up on dlang.org forums. I'm also going to test LDC's profiling which (AFAIK) uses completely different profiling functions/instrumentation.

One second to process 15 MB of data with all those extra clipping/locking/pixel-format checks on a humble Celeron netbook is not surprising or unreasonable. I'm probably going to move forward with a specific internal al_get_pixel function or code snippet, as well as follow up with the profiling.

The clue came to me when I was running Valgrind tests on it. I kept getting functions called "trace" taking large amounts of time and they were embedded in allegro functions/etc. And then the dumb revelation finally dawned on me. "DUH! I had --profile on!" I've been coding with it on and never had any problem with it. But I think maybe this specific use case explodes the overhead of whatever criterea/method/algorithm they're using for tracing in DMD.

-----sig:
“Programs should be written for people to read, and only incidentally for machines to execute.” - Structure and Interpretation of Computer Programs

Go to: