Performance issue regarding al_put_pixel and raycasting
FlyingFromage

Hello, im attempting to follow this tutorial to make a raycaster game like wolfenstien 3D but the tutorial it uses the SDL library instead of allegro and i've run into some major performance hits when trying to follow the second part of the tutorial involving textured walls. I've converted his code into allegro quite easily and everything works as intended, its just that performance drops significantly when i'm looking at a wall and close to a wall (presumable because it has to run through the drawing loop more times). Ive already searched around for potential solutions to my problem and i think that ive narrowed it down to it just being an issue with al_put_pixel(). ive tried following suggestions like locking bitmaps and drawing to another buffer bitmap first and then drawing that to the back buffer to render but none of them have worked. i dont know if thats because im implementing them wrong or what but id appreciate if i could get some help and maybe specific suggestions for how to implement the code that will make it more efficient. Thanks.

Here is the entirety of the code, its a bit messy because i copied and pasted bits from that tutorial i linked above. if you would like clarification of something please ask. thanks for the help.

(i had to omit the definition of worldMap because this post is too long)

#SelectExpand
1#include <cmath> 2#include <string> 3#include <vector> 4#include <iostream> 5 6#include"allegro5/allegro5.h" 7#include"allegro5/allegro_primitives.h" 8#include"allegro5/allegro_image.h" 9#include"allegro5/allegro_font.h" 10#include"allegro5/allegro_ttf.h" 11 12#include"stdint.h" 13 14#define screenWidth 1280 15#define screenHeight 720 16#define mapWidth 24 17#define mapHeight 24 18#define texWidth 64 19#define texHeight 64 20 21int worldMap[mapWidth][mapHeight]; 22 23bool keys[] = { false, false, false, false, false, false, false, false }; 24enum keys { UP, DOWN, LEFT, RIGHT, SPACE, ENTER, MOUSELEFT, MOUSERIGHT }; 25 26int main() 27{ 28 bool done = false; 29 bool redraw = false; 30 31 int mouseX = 0; 32 int mouseY = 0; 33 34 double posX = 22, posY = 12; //x and y start position 35 double dirX = -1, dirY = 0; //initial direction vector 36 double planeX = 0, planeY = 0.66; //the 2d raycaster version of camera plane 37 38 double moveSpeed = 0.075; //the constant value is in squares/second 39 double rotSpeed = 0.05; //the constant value is in radians/second 40 41 ALLEGRO_DISPLAY *display = NULL; 42 ALLEGRO_EVENT_QUEUE *eventQueue = NULL; 43 ALLEGRO_TIMER *timer = NULL; 44 45 ALLEGRO_FONT *font = NULL; 46 ALLEGRO_BITMAP *titleImage = NULL; 47 48 srand(time(NULL)); 49 50 al_init(); 51 al_init_primitives_addon(); 52 al_init_image_addon(); 53 al_init_font_addon(); 54 al_init_ttf_addon(); 55 al_install_mouse(); 56 al_install_keyboard(); 57 58 al_set_new_bitmap_flags(ALLEGRO_VIDEO_BITMAP); 59 60 ALLEGRO_BITMAP* texture[6]; 61 62 texture[0] = al_load_bitmap("redbrick.png"); 63 texture[1] = al_load_bitmap("eagle.png"); 64 texture[2] = al_load_bitmap("purplestone.png"); 65 texture[3] = al_load_bitmap("mossy.png"); 66 texture[4] = al_load_bitmap("bluestone.png"); 67 texture[5] = al_load_bitmap("colorstone.png"); 68 69 for (int i = 0; i < 6; i++) 70 al_lock_bitmap(texture[i], al_get_bitmap_format(texture[i]), ALLEGRO_LOCK_READONLY); 71 72 int w = screenWidth; 73 int h = screenHeight; 74 75 display = al_create_display(w, h); 76 eventQueue = al_create_event_queue(); 77 timer = al_create_timer(1.0 / 60); 78 79 al_register_event_source(eventQueue, al_get_timer_event_source(timer)); 80 al_register_event_source(eventQueue, al_get_display_event_source(display)); 81 al_register_event_source(eventQueue, al_get_keyboard_event_source()); 82 al_register_event_source(eventQueue, al_get_mouse_event_source()); 83 84 al_start_timer(timer); 85 while (!done) 86 { 87 ALLEGRO_EVENT ev; 88 al_wait_for_event(eventQueue, &ev); 89 90 if (ev.type == ALLEGRO_EVENT_KEY_DOWN) 91 { 92 switch (ev.keyboard.keycode) 93 { 94 case ALLEGRO_KEY_UP: 95 keys[UP] = true; 96 break; 97 case ALLEGRO_KEY_DOWN: 98 keys[DOWN] = true; 99 break; 100 case ALLEGRO_KEY_LEFT: 101 keys[LEFT] = true; 102 break; 103 case ALLEGRO_KEY_RIGHT: 104 keys[RIGHT] = true; 105 break; 106 case ALLEGRO_KEY_SPACE: 107 keys[SPACE] = true; 108 break; 109 case ALLEGRO_KEY_ENTER: 110 keys[ENTER] = true; 111 break; 112 } 113 } 114 else if (ev.type == ALLEGRO_EVENT_KEY_UP) 115 { 116 switch (ev.keyboard.keycode) 117 { 118 case ALLEGRO_KEY_UP: 119 keys[UP] = false; 120 break; 121 case ALLEGRO_KEY_DOWN: 122 keys[DOWN] = false; 123 break; 124 case ALLEGRO_KEY_LEFT: 125 keys[LEFT] = false; 126 break; 127 case ALLEGRO_KEY_RIGHT: 128 keys[RIGHT] = false; 129 break; 130 case ALLEGRO_KEY_SPACE: 131 keys[SPACE] = false; 132 break; 133 case ALLEGRO_KEY_ENTER: 134 keys[ENTER] = false; 135 break; 136 } 137 } 138 else if (ev.type == ALLEGRO_EVENT_MOUSE_BUTTON_DOWN) 139 { 140 if (ev.mouse.button & 1) 141 { 142 keys[MOUSELEFT] = true; 143 } 144 else if (ev.mouse.button & 2) 145 { 146 keys[MOUSERIGHT] = true; 147 } 148 } 149 else if (ev.type == ALLEGRO_EVENT_MOUSE_BUTTON_UP) 150 { 151 if (ev.mouse.button & 1) 152 { 153 keys[MOUSELEFT] = false; 154 } 155 else if (ev.mouse.button & 2) 156 { 157 keys[MOUSERIGHT] = false; 158 } 159 } 160 else if (ev.type == ALLEGRO_EVENT_MOUSE_AXES) 161 { 162 mouseX = ev.mouse.x; 163 mouseY = ev.mouse.y; 164 } 165 else if (ev.type == ALLEGRO_EVENT_DISPLAY_CLOSE) 166 { 167 done = true; 168 } 169 else if (ev.type == ALLEGRO_EVENT_TIMER) 170 { 171 172 if (keys[RIGHT]) 173 { 174 //both camera direction and camera plane must be rotated 175 double oldDirX = dirX; 176 dirX = dirX * cos(-rotSpeed) - dirY * sin(-rotSpeed); 177 dirY = oldDirX * sin(-rotSpeed) + dirY * cos(-rotSpeed); 178 double oldPlaneX = planeX; 179 planeX = planeX * cos(-rotSpeed) - planeY * sin(-rotSpeed); 180 planeY = oldPlaneX * sin(-rotSpeed) + planeY * cos(-rotSpeed); 181 } 182 else if (keys[LEFT]) 183 { 184 //both camera direction and camera plane must be rotated 185 double oldDirX = dirX; 186 dirX = dirX * cos(rotSpeed) - dirY * sin(rotSpeed); 187 dirY = oldDirX * sin(rotSpeed) + dirY * cos(rotSpeed); 188 double oldPlaneX = planeX; 189 planeX = planeX * cos(rotSpeed) - planeY * sin(rotSpeed); 190 planeY = oldPlaneX * sin(rotSpeed) + planeY * cos(rotSpeed); 191 } 192 if (keys[UP]) 193 { 194 if (worldMap[int(posX + dirX * moveSpeed)][int(posY)] == false) posX += dirX * moveSpeed; 195 if (worldMap[int(posX)][int(posY + dirY * moveSpeed)] == false) posY += dirY * moveSpeed; 196 } 197 else if (keys[DOWN]) 198 { 199 if (worldMap[int(posX - dirX * moveSpeed)][int(posY)] == false) posX -= dirX * moveSpeed; 200 if (worldMap[int(posX)][int(posY - dirY * moveSpeed)] == false) posY -= dirY * moveSpeed; 201 } 202 203 redraw = true; 204 } 205 if (redraw && al_is_event_queue_empty(eventQueue)) 206 { 207 redraw = false; 208 209 al_draw_filled_rectangle(0, 0, w, h / 2, al_map_rgb(47, 79, 79)); 210 al_draw_filled_rectangle(0, h / 2, w, h, al_map_rgb(112, 128, 144)); 211 212 for (int x = 0; x < w; x++) 213 { 214 215 //calculate ray position and direction 216 double cameraX = 2 * x / double(w) - 1; //x-coordinate in camera space 217 double rayPosX = posX; 218 double rayPosY = posY; 219 double rayDirX = dirX + planeX * cameraX; 220 double rayDirY = dirY + planeY * cameraX; 221 //which box of the map we're in 222 int mapX = int(rayPosX); 223 int mapY = int(rayPosY); 224 225 //length of ray from current position to next x or y-side 226 double sideDistX; 227 double sideDistY; 228 229 //length of ray from one x or y-side to next x or y-side 230 double deltaDistX = sqrt(1 + (rayDirY * rayDirY) / (rayDirX * rayDirX)); 231 double deltaDistY = sqrt(1 + (rayDirX * rayDirX) / (rayDirY * rayDirY)); 232 double perpWallDist; 233 234 //what direction to step in x or y-direction (either +1 or -1) 235 int stepX; 236 int stepY; 237 238 int hit = 0; //was there a wall hit? 239 int side; //was a NS or a EW wall hit? 240 //calculate step and initial sideDist 241 if (rayDirX < 0) 242 { 243 stepX = -1; 244 sideDistX = (rayPosX - mapX) * deltaDistX; 245 } 246 else 247 { 248 stepX = 1; 249 sideDistX = (mapX + 1.0 - rayPosX) * deltaDistX; 250 } 251 if (rayDirY < 0) 252 { 253 stepY = -1; 254 sideDistY = (rayPosY - mapY) * deltaDistY; 255 } 256 else 257 { 258 stepY = 1; 259 sideDistY = (mapY + 1.0 - rayPosY) * deltaDistY; 260 } 261 //perform DDA 262 while (hit == 0) 263 { 264 //jump to next map square, OR in x-direction, OR in y-direction 265 if (sideDistX < sideDistY) 266 { 267 sideDistX += deltaDistX; 268 mapX += stepX; 269 side = 0; 270 } 271 else 272 { 273 sideDistY += deltaDistY; 274 mapY += stepY; 275 side = 1; 276 } 277 //Check if ray has hit a wall 278 if (worldMap[mapX][mapY] > 0) hit = 1; 279 } 280 //Calculate distance projected on camera direction (oblique distance will give fisheye effect!) 281 if (side == 0) perpWallDist = (mapX - rayPosX + (1 - stepX) / 2) / rayDirX; 282 else perpWallDist = (mapY - rayPosY + (1 - stepY) / 2) / rayDirY; 283 284 //Calculate height of line to draw on screen 285 int lineHeight = (int)(h / perpWallDist); 286 287 //calculate lowest and highest pixel to fill in current stripe 288 int drawStart = -lineHeight / 2 + h / 2; 289 if (drawStart < 0)drawStart = 0; 290 int drawEnd = lineHeight / 2 + h / 2; 291 if (drawEnd >= h)drawEnd = h - 1; 292 293 //texturing calculations 294 int texNum = worldMap[mapX][mapY] - 1; //1 subtracted from it so that texture 0 can be used! 295 296 //calculate value of wallX 297 double wallX; //where exactly the wall was hit 298 if (side == 0) wallX = rayPosY + perpWallDist * rayDirY; 299 else wallX = rayPosX + perpWallDist * rayDirX; 300 wallX -= floor((wallX)); 301 302 //x coordinate on the texture 303 int texX = int(wallX * double(texWidth)); 304 if (side == 0 && rayDirX > 0) texX = texWidth - texX - 1; 305 if (side == 1 && rayDirY < 0) texX = texWidth - texX - 1; 306 307 //al_set_target_bitmap(buffer); 308 for (int y = drawStart; y < drawEnd; y++) 309 { 310 int d = y * 256 - h * 128 + lineHeight * 128; //256 and 128 factors to avoid floats 311 int texY = ((d * texHeight) / lineHeight) / 256; 312 al_draw_pixel(x, y, al_get_pixel(texture[texNum], texX, texY)); 313 } 314 } 315 316 al_flip_display(); 317 al_clear_to_color(al_map_rgb(0, 0, 0)); 318 } 319 } 320 321 al_destroy_bitmap(texture[0]); 322 al_destroy_bitmap(texture[1]); 323 al_destroy_bitmap(texture[2]); 324 al_destroy_bitmap(texture[3]); 325 al_destroy_bitmap(texture[4]); 326 al_destroy_bitmap(texture[5]); 327 328 al_destroy_display(display); 329 al_destroy_event_queue(eventQueue); 330 al_destroy_timer(timer); 331 332 return 0; 333}

jmasterx

Allegro5 is different than old SDL in that it is hardware accelerated, which has bad performance on per-pixel operations.

Try to lock your texture for write only, write out the pixels manually, then render the texture instead of draw operations. Make sure allegro does not preserve the texture.

Chris Katko

Also try making 1x1 rectangles using the primitive addon. I know it sounds strange, but I've had better performance using that.

FlyingFromage

Thanks for the replies but could you please explain to me what you mean by (im quite new to graphics and allegro)

jmasterx said:

write out the pixels manually, then render the texture instead of draw operations

thanks for the help.

edit: i tried using the primitive add on to draw rectangles instead and that worsened the issue

jmasterx

Right now, you are asking the graphics card to draw n pixels of rectangles and it has to send the verticies and texture coords for each call to al_draw_pixel.

Allegro can instead allow you to create a texture using al_create_bitmap.
You then lock it https://www.allegro.cc/manual/5/al_lock_bitmap and that puts it in RAM. Once in RAM, you access and write to the individual pixels... in RAM.

You then release the lock that sends the texture to the graphics card and then you draw it to the screen. This should be much faster.

inline void write_pixel_argb_8888(ALLEGRO_LOCKED_REGION *region, int x, int y, ALLEGRO_COLOR &col)
{
  uint32_t *ptr32;
  unsigned char r, g, b, a;
  al_unmap_rgba(col, &r, &g, &b, &a);
  ptr32 = (uint32_t *)region->data + x + y*(region->pitch/4);
  *ptr32 = (a << 24) | (r << 16) | (g << 8) | b;
}

Peter Hull

If you could attach all the code as a zip that might help.

Just looking at the code, are you clear on the difference between al_put_pixel and al_draw_pixel? Locking the screen with ALLEGRO_LOCK_WRITEONLY should increase speed. Also I think your bitmaps ought to be loaded as memory bitmaps, not video for fastest access. Another thing (I'm not sure if this is still true or ever has been true for allegro 5!) if you create the display before loading the bitmaps it will ensure they have the same pixel format.

Finally, I know that MSVC compiles in a lot of bounds checking if you use STL containers in debug mode - could that be an issue for you?

If none of that helps you will have to go to direct access as jmasterx suggests.

FlyingFromage

Thanks for all of the suggestions guys. unfortunately i think im just too much of a novice to figure this out because I've tried implementing all of your suggestions multiple times but its either not working or im not doing it correctly (probably the latter). I appreciate the help though, thanks for your time.

Peter Hull

Don't give up now :-/

I had a quick try and I got it to run OK (admittedly not as quickly as I remember Wolfenstein running back in 1993 but that was only running at 320x200 IIRC)

I changed line 58 to use ALLEGRO_MEMORY_BITMAP and dropped the locking code on lines 69-70.
I changed al_draw_pixel to al_put_pixel on line 312.
Finally I added on line 211

al_lock_bitmap(al_get_backbuffer(display), ALLEGRO_PIXEL_FORMAT_ANY, ALLEGRO_LOCK_READWRITE);


and on line 315

Does that help any with your system?

GullRaDriel

If it's a per pixel operation, why not using a Pixel Shader ?

FlyingFromage

Thanks for the extra help Peter Hull, i implemented everything that you said to do and it did help but not to the extent i'm looking for still. Im still trying to figure out how to implement jmsaterx's solution but i always end up with worse results (and weird artifact-ing).

jmasterx

Here is an implementation of what I was talking about.

I also did some basic optimizations to avoid multiplications in the tight loop.
https://www.allegro.cc/files/attachment/610374
I'm getting a great framerate compared to your original code.

Some of your logic that you do in the render could be optimized/cached in memory.

Those 2 square roots are not cheap.

Mark Oates

One trick I did for my ray casting demo was create vertical, 1px wide subbitmap "slices" of each wall texture.

So rather than copying pixels, you just draw that sub-bitmaps when rendering a wall.

FlyingFromage

Thanks for the help all! with your implementation jmasterx, there was an issue where whenever the camera got really close to the wall it sometimes had a massive performance drop for a couple seconds and then would be just fine afterwards. But i fixed that by not locking the texture bitmaps which didn't affect the general performance noticeably.

Thread #616285. Printed from Allegro.cc