trans_blender, way too slow

no-reply@allegro.cc (The Unknown) — Mon, 19 Mar 2007 07:02:14 +0000

In my game, to make pausing the game a little more... interesting. I used the transblender to make a translucent colour over the whole of the screen. And then cycle through all of the colours of the rainbow... 'tis very nice.

But since i increased the screen size from 640*480 to 800*480 (for that sweet widescreenedness) it goes incredibly slowly, no really, CPU goes from 12% to 100%, and framerate drops from 60FPS to about 20FPS, not so nice anymore.

Is there anyway i can speed up the process?

set_trans_blender(0, 0, 0, 75);

drawing_mode(DRAW_MODE_TRANS, 0, 0, 0);
rectfill(Buffer, 0, 0, 800, 480, makecol(Red, Green, Blue));
drawing_mode(DRAW_MODE_SOLID, 0, 0, 0);

no-reply@allegro.cc (BAF) — Mon, 19 Mar 2007 07:09:08 +0000

Use something with hardware acceleration, like Open Layer.

no-reply@allegro.cc (kazzmir) — Mon, 19 Mar 2007 07:21:39 +0000

Or fblend

no-reply@allegro.cc (Onewing) — Mon, 19 Mar 2007 08:18:52 +0000

For tinting the screen in any other color depth besides 8-bit with only using vanilla allegro, I've found using draw_lit_sprite can increase the framerate. This is not a major increase, but it will work better if you do not wish to tag on an add-on library (which will be the recommended solution).

no-reply@allegro.cc (Krzysztof Kluczek) — Mon, 19 Mar 2007 08:38:48 +0000

For 15, 16 and 32-bit modes you can use something like the code below. It does 50-50 average with given color. The code below works in 32-bit modes, but you may write 15-bit and 16-bit versions quite easily. Note that it works only for memory bitmaps.

void tint_bitmap(BITMAP *bmp,int color)
{
  color = (color>>1)&0x7F7F7F;
  for(int y=0;y<bmp->h;y++)
  {
    int *pixel = (int*)(bmp->line[y]);
    int *end = pixel + bmp->w;
    while(pixel<end)
    {
      *pixel = ((*pixel>>1)&0x7F7F7F)+color;
      pixel++;
    }
  }
}

no-reply@allegro.cc (Paul whoknows) — Mon, 19 Mar 2007 11:51:36 +0000

Use fblend! I am using it since a few days ago, and it works nicely and fast! and it is very easy to use!

I don't know why, but Krzysztof's code is really fast!
I added it to my project, and applied it to my buffer bitmap (640x480x32) and it only increased ~10% CPU usage, fblend_rect_trans() under the same conditions added ~13%.
But of course, I could be completely wrong, perhaps that's not a proper way to compare efficiency.

no-reply@allegro.cc (Krzysztof Kluczek) — Mon, 19 Mar 2007 14:16:12 +0000

Quote:

I don't know why, but Krzysztof's code is really fast!

It's just simple, uses only basic operations and works on entire RGB triples. Also it gets some speed for sure from working directly with pointers. You still probably can make it even faster by using MMX and operating on two pixels in every iteration (MMX registers are 64-bit wide) or even 4 pixels at once in 15 and 16-bit modes.

Replacing loop condition with basic "for" loop can make it a bit faster, but that depends on compiler ability to optimize it to "loop" instruction.

The cool thing is that you can use the same approach with some other basic operations by just finding how to do the thing using few shifts, additions and other basic operations.

no-reply@allegro.cc (HoHo) — Mon, 19 Mar 2007 14:39:08 +0000

Also unrolling it might give some boost. On Core2 based CPU's, using SSE would further give significant speed increase

no-reply@allegro.cc (Krzysztof Kluczek) — Mon, 19 Mar 2007 15:12:49 +0000

Quote:

Also unrolling it might give some boost.

You can't really unroll entire loop as its length depends on bitmap width, but unrolling it a bit to make loop deal with four pixels in single iteration might be worth it. Unrolling it more won't make that much difference and will make loop code longer, which CPU might not like.

Quote:

On Core2 based CPU's, using SSE would further give significant speed increase

You should be able to do it with SSE2 (Pentium 4).

no-reply@allegro.cc (HoHo) — Mon, 19 Mar 2007 16:32:09 +0000

Quote:

Unrolling it more won't make that much difference and will make loop code longer, which CPU might not like.

This is true, especially in 32bit.

Quote:

You should be able to do it with SSE2 (Pentium 4).

Yes, but clock-to-clock Core2 has twice the SSE throughput of P4 and K8
On other CPU's using plain old MMX should give comparable results to SSE2.

no-reply@allegro.cc (Milan Mimica) — Mon, 19 Mar 2007 17:46:49 +0000

Use allegrogl.

no-reply@allegro.cc (GullRaDriel) — Mon, 19 Mar 2007 18:04:35 +0000

Use allegrogl.

no-reply@allegro.cc (Bob) — Mon, 19 Mar 2007 23:35:15 +0000

FBlend supports subbitmaps, uses memory bitmaps correctly, and needs to do more checks for things like 15 vs 16 vs 32 bit. Other than that, you're probably bandwidth bound (and not compute bound), so MMX/SSE would not help.

no-reply@allegro.cc (X-G) — Tue, 20 Mar 2007 00:41:46 +0000

Just to make sure... Buffer is a memory bitmap, and not a video bitmap, right? Because doing any kind of blending operation on a video bitmap without the aid of, say, OpenGL is going to be very painful for your FPS.

no-reply@allegro.cc (HoHo) — Tue, 20 Mar 2007 00:42:13 +0000

If you happen to use small bitmaps and they fit to cache you probably won't be that limited by bandwidth. 800x600@32bit takes around 2M. If you have a CPU with big cache it might be worth it to use more efficient SIMD instructions. Though when you already have a CPU with big cache it will probably be fast enough already