Allegro.cc - Online Community

Allegro.cc Forums » Programming Questions » trans_blender, way too slow

This thread is locked; no one can reply to it. rss feed Print
trans_blender, way too slow
The Unknown
Member #8,441
March 2007

In my game, to make pausing the game a little more... interesting. I used the transblender to make a translucent colour over the whole of the screen. And then cycle through all of the colours of the rainbow... 'tis very nice.

But since i increased the screen size from 640*480 to 800*480 (for that sweet widescreenedness) it goes incredibly slowly, no really, CPU goes from 12% to 100%, and framerate drops from 60FPS to about 20FPS, not so nice anymore.

Is there anyway i can speed up the process?

set_trans_blender(0, 0, 0, 75);

drawing_mode(DRAW_MODE_TRANS, 0, 0, 0);
rectfill(Buffer, 0, 0, 800, 480, makecol(Red, Green, Blue));
drawing_mode(DRAW_MODE_SOLID, 0, 0, 0);

BAF
Member #2,981
December 2002
avatar

Use something with hardware acceleration, like Open Layer.

kazzmir
Member #1,786
December 2001
avatar

Onewing
Member #6,152
August 2005
avatar

For tinting the screen in any other color depth besides 8-bit with only using vanilla allegro, I've found using draw_lit_sprite can increase the framerate. This is not a major increase, but it will work better if you do not wish to tag on an add-on library (which will be the recommended solution).

------------
Solo-Games.org | My Tech Blog: The Digital Helm

Krzysztof Kluczek
Member #4,191
January 2004
avatar

For 15, 16 and 32-bit modes you can use something like the code below. It does 50-50 average with given color. The code below works in 32-bit modes, but you may write 15-bit and 16-bit versions quite easily. Note that it works only for memory bitmaps. :)

void tint_bitmap(BITMAP *bmp,int color)
{
  color = (color>>1)&0x7F7F7F;
  for(int y=0;y<bmp->h;y++)
  {
    int *pixel = (int*)(bmp->line[y]);
    int *end = pixel + bmp->w;
    while(pixel<end)
    {
      *pixel = ((*pixel>>1)&0x7F7F7F)+color;
      pixel++;
    }
  }
}

Paul whoknows
Member #5,081
September 2004
avatar

Use fblend! I am using it since a few days ago, and it works nicely and fast! and it is very easy to use!

I don't know why, but Krzysztof's code is really fast!
I added it to my project, and applied it to my buffer bitmap (640x480x32) and it only increased ~10% CPU usage, fblend_rect_trans() under the same conditions added ~13%.
But of course, I could be completely wrong, perhaps that's not a proper way to compare efficiency.

____

"The unlimited potential has been replaced by the concrete reality of what I programmed today." - Jordan Mechner.

Krzysztof Kluczek
Member #4,191
January 2004
avatar

Quote:

I don't know why, but Krzysztof's code is really fast!

It's just simple, uses only basic operations and works on entire RGB triples. Also it gets some speed for sure from working directly with pointers. You still probably can make it even faster by using MMX and operating on two pixels in every iteration (MMX registers are 64-bit wide) or even 4 pixels at once in 15 and 16-bit modes. :)

Replacing loop condition with basic "for" loop can make it a bit faster, but that depends on compiler ability to optimize it to "loop" instruction. :)

The cool thing is that you can use the same approach with some other basic operations by just finding how to do the thing using few shifts, additions and other basic operations. :)

HoHo
Member #4,534
April 2004
avatar

Also unrolling it might give some boost. On Core2 based CPU's, using SSE would further give significant speed increase :)

__________
In theory, there is no difference between theory and practice. But, in practice, there is - Jan L.A. van de Snepscheut
MMORPG's...Many Men Online Role Playing Girls - Radagar
"Is Java REALLY slower? Does STL really bloat your exes? Find out with your friendly host, HoHo, and his benchmarking machine!" - Jakub Wasilewski

Krzysztof Kluczek
Member #4,191
January 2004
avatar

Quote:

Also unrolling it might give some boost.

You can't really unroll entire loop as its length depends on bitmap width, but unrolling it a bit to make loop deal with four pixels in single iteration might be worth it. Unrolling it more won't make that much difference and will make loop code longer, which CPU might not like. :)

Quote:

On Core2 based CPU's, using SSE would further give significant speed increase :)

You should be able to do it with SSE2 (Pentium 4). :)

HoHo
Member #4,534
April 2004
avatar

Quote:

Unrolling it more won't make that much difference and will make loop code longer, which CPU might not like.

This is true, especially in 32bit.

Quote:

You should be able to do it with SSE2 (Pentium 4).

Yes, but clock-to-clock Core2 has twice the SSE throughput of P4 and K8 ;)
On other CPU's using plain old MMX should give comparable results to SSE2.

__________
In theory, there is no difference between theory and practice. But, in practice, there is - Jan L.A. van de Snepscheut
MMORPG's...Many Men Online Role Playing Girls - Radagar
"Is Java REALLY slower? Does STL really bloat your exes? Find out with your friendly host, HoHo, and his benchmarking machine!" - Jakub Wasilewski

Milan Mimica
Member #3,877
September 2003
avatar

Use allegrogl.

GullRaDriel
Member #3,861
September 2003
avatar

Use allegrogl.

"Code is like shit - it only smells if it is not yours"
Allegro Wiki, full of examples and articles !!

Bob
Free Market Evangelist
September 2000
avatar

FBlend supports subbitmaps, uses memory bitmaps correctly, and needs to do more checks for things like 15 vs 16 vs 32 bit. Other than that, you're probably bandwidth bound (and not compute bound), so MMX/SSE would not help.

--
- Bob
[ -- All my signature links are 404 -- ]

X-G
Member #856
December 2000
avatar

Just to make sure... Buffer is a memory bitmap, and not a video bitmap, right? Because doing any kind of blending operation on a video bitmap without the aid of, say, OpenGL is going to be very painful for your FPS.

--
Since 2008-Jun-18, democracy in Sweden is dead. | 悪霊退散!悪霊退散!怨霊、物の怪、困った時は ドーマン!セーマン!ドーマン!セーマン! 直ぐに呼びましょう陰陽師レッツゴー!

HoHo
Member #4,534
April 2004
avatar

If you happen to use small bitmaps and they fit to cache you probably won't be that limited by bandwidth. 800x600@32bit takes around 2M. If you have a CPU with big cache it might be worth it to use more efficient SIMD instructions. Though when you already have a CPU with big cache it will probably be fast enough already :P

__________
In theory, there is no difference between theory and practice. But, in practice, there is - Jan L.A. van de Snepscheut
MMORPG's...Many Men Online Role Playing Girls - Radagar
"Is Java REALLY slower? Does STL really bloat your exes? Find out with your friendly host, HoHo, and his benchmarking machine!" - Jakub Wasilewski

Go to: