Allegro.cc - Online Community

Allegro.cc Forums » Programming Questions » Greyscale-Filter is too slow. How to improve performance?

This thread is locked; no one can reply to it. rss feed Print
 1   2   3   4 
Greyscale-Filter is too slow. How to improve performance?
DieBacke
Member #5,548
February 2005

hi,
basically this is my filter:

1static void FilterMakeMonochrom(BITMAP* bmp)
2{
3 //make the bitmap of the child monochrom
4 acquire_bitmap(bmp);
5 int grey;
6 int color;
7 for (int x = 0; x <= SCREEN_W; x++)
8 {
9 for (int y = 0; y <= SCREEN_H; y++)
10 {
11 color = getpixel(bmp, x, y);
12 grey = ((getr(color) + getg(color) + getb(color)) / 3);
13 putpixel(bmp, x, y, makecol(grey, grey, grey));
14 };
15 };
16 release_bitmap(bmp);
17};

it is very slow... when I try it with lower resolutions, it is much smoother (1024*768 compared to 640*480 :)). So the part in the loop is not so well done by me... There must be a critical mistake, otherwise such a simple filter wouldn't lag so bad...

If I don't call acquire_bitmap() the performance doesn't get worse... shouldn't that lower fps to almost zero?

thx, ~~Backe

miran
Member #2,407
June 2002

Don't use generic getpixel, putpixel, makecol and getX functions! Use the colour depth specific ones instead. Actually you should access bitmap memory directly. That'll make your code several times faster.

EDIT: But at 1024x768 that will still be slow. You can probaly use OpenLayer to get the same effect. I've no idea how though...

--
sig used to be here

Kitty Cat
Member #2,815
October 2002
avatar

What type of bitmap is being passed? A memory bitmap? Video? If you know the depth, you can use the functions directly:

_getpixel32 // note the underscore!
_putpixel32 // note the underscore!
getr32
getg32
getb32
makecol32

Replace 32 with the depth you're using.

EDIT:
You can also use

bmp_select(bmp);
addr = bmp_read_line(bmp, y);
for(x)
   buf[x] = bmp_read32(addr+(x*4));

do_stuff_to_line(buf);

addr = bmp_write_line(bmp, y);
for(x)
   bmp_write32(addr+(x*4), buf[x]);

Check the exflame example for methods of writing directly to bitmaps.

--
"Do not meddle in the affairs of cats, for they are subtle and will pee on your computer." -- Bruce Graham

FrankyR
Member #243
April 2000
avatar

For some info on how to access BITMAP data directly check out: http://www.allegro.cc/manual/api/direct-access-to-video-memory/

Evert
Member #794
November 2000
avatar

In addition to what was said above, swapping the order of your loops may also help due to the way the bitmaps are layed out in memory.
If you can use an 8 bit image for this, you can get the effect very quickly by using a grey-scale palette.

gnolam
Member #2,030
March 2002
avatar

It should also be noted that the code you provided may not produce the result you want. In that case, you'll have to weight your color components: luminance

--
Move to the Democratic People's Republic of Vivendi Universal (formerly known as Sweden) - officially democracy- and privacy-free since 2008-06-18!

DieBacke
Member #5,548
February 2005

the colour depth specific getpixel and putpixel functions and the ((long *)bmp->line[y])[x] crash my program ??? in my IDE the program quits, and keeps the resolution, though. If I run the exe manually, the window just becomes black.
The bitmap passed is a memory bitmap. Nothing special, just a simple doublebuffer.
color depth is 32. The getX functions work better when color depth specific.

Richard Phipps
Member #1,632
November 2001
avatar

If it crashes then you are trying to read from a point outside the bitmap.

Elverion
Member #6,239
September 2005
avatar

I also find that, instead of using nested for loops, using a single for loop can benefit speed more than you would think. That is, instead of:

// maxa represents the max "width"
// maxb represents the max "height"
for(int b = 0; b < maxb; b++)
  for(int a = 0; a < maxa; a++)
    do_something();

use:

// maxa represents the max "width"
// maxb represents the max "height"
for(int i = 0; i < (maxb * maxa); i++)
{
  int b = i / maxa; // b represents vertical movement
  int a = i % maxa; // a represents horizontal movement
  
  do_something();
}

Of course, this doesn't always work(mostly due to optimizations where possible), and you probably knew this anyways.

I was thinking that, maybe, you could somehow use draw_lit_sprite to fake the whole grayscaling processes. I haven't used this function before, and don't know it's limitations.

--
SolarStrike Software - MicroMacro home - Automation software.

DieBacke
Member #5,548
February 2005

Quote:

If it crashes then you are trying to read from a point outside the bitmap.

Yes, your right.
Ok, this already does a pretty good job:

1static void FilterMakeMonochrom(BITMAP* bmp)
2{
3 //make the bitmap of the child monochrom
4 acquire_bitmap(bmp);
5 int grey;
6 int color;
7 for(int i = 0; i < (bmp->w * bmp->h); i++)
8 {
9 int x = i % bmp->w;
10 int y = i / bmp->w;
11
12 color = _getpixel32(bmp, x, y);
13 grey = ((getr32(color) + getg32(color) + getb32(color)) / 3);
14 ((long *)bmp->line[y])[x] = makecol32(grey, grey, grey);
15 }
16 release_bitmap(bmp);
17};

It still lags, but it's better.
How could one single loop improve performance? I tried it, it is not noticeable without counting the frames (I don't count the frames in my program)

Evert
Member #794
November 2000
avatar

Quote:

for(int i = 0; i < (bmp->w * bmp->h); i++)
{
int x = i % bmp->w;
int y = i / bmp->w;

Store bmp->w*bmp->h, and copy bmp->w and bmp->h to local variables (right now, you're doing a pointer dereference in your inner loop, which is slower than accessing a local variable).

The single loop might buy you some time because it safes one jump instruction at the end of the loop (the nested loops have two jumps, one for each loop), on the other hand, division and modulus are slow operations (but they can probably be pipelined and executed together, someone else will know).

Richard Phipps
Member #1,632
November 2001
avatar

Try this:

1 for (y = 0 ; y < bmp->h ; y++)
2 {
3 for (x = 0 ; x < bmp->w ; x++)
4 {
5 // Fast access to 4 bytes of memory bitmap data, i.e. one 32 bit pixel.
6 c = ((long *)bmp->line[y])[x];
7
8 // Split into colour components.
9 r = (c >> 16) & 0xFF;
10 g = (c >> 8) & 0xFF;
11 b = c & 0xFF;
12
13 // Quick but inaccurate greyscale conversion.
14 grey = ((r + g + b) / 3);
15
16 // Write the new value.
17 ((long *)bmp->line[y])[x] = (grey << 16 | grey << 8 | grey);
18 }
19 }

It can be optimised further, but I wanted you to see what was happening first.

Neil Walker
Member #210
April 2000
avatar

.3 * R + .59 * G + .11 * B

Isn't that much slower in a computer that can do 5 trillion calculations a second.

Neil.
MAME Cabinet Blog / AXL LIBRARY (a games framework) / AXL Documentation and Tutorial

wii:0356-1384-6687-2022, kart:3308-4806-6002. XBOX:chucklepie

Richard Phipps
Member #1,632
November 2001
avatar

True, but unless it needs to be accurate, the quick divide by 3 will look fine and be a little bit faster.

Kitty Cat
Member #2,815
October 2002
avatar

   c = ((long *)bmp->line[y])[x]; 

   // Split into colour components.
   // Quick but inaccurate greyscale conversion. 
   grey  = (c >> 16) & 0xFF;
   grey += (c >> 8) & 0xFF;
   grey += c & 0xFF;
   grey /= 3;

   // Write the new value.  
   ((long *)bmp->line[y])[x] = grey * ((1<<16) | (1<<8) | 1);

--
"Do not meddle in the affairs of cats, for they are subtle and will pee on your computer." -- Bruce Graham

Richard Phipps
Member #1,632
November 2001
avatar

Yep, that's the optimised version. Although you could put the grey bit in the pixel write and get rid of the grey variable altogether.

Jonatan Hedborg
Member #4,886
July 2004
avatar

A faster way would be to make a whole new set of bitmaps, which are greyscale. They could be automaticly generated on load and stored "together" (in the same class/struct) with their "parent" bitmap. Then you'd have some function which switch their pointers when you want to draw in greyscale.

-------
Sweden: Free from the shackles of Democracy since 2008-06-18!

Krzysztof Kluczek
Member #4,191
January 2004
avatar

Quote:

.3 * R + .59 * G + .11 * B

.25*R + .625*G + .125*B shouldn't be too much off. :)

gray =
    ((color&0xFC0000)>>18) +
    ((color&0x00FE00)>> 9) +
    ((color&0x00F800)>>11) +
    ((color&0x0000F8)>> 3);

gray|=(gray<<8)|(gray<<16);

And if .25*R + .5*G + .25*B is fine for you:

gray =
    ((color&0xFC0000)>>18) +
    ((color&0x00FE00)>> 9) +
    ((color&0x0000FC)>> 2);

gray|=(gray<<8)|(gray<<16);

Elverion
Member #6,239
September 2005
avatar

Quote:

.3 * R + .59 * G + .11 * B
.25*R + .625*G + .125*B

What's with the seemingly random decimal points? I could understand multiplying them all by 0.3, or even using a slightly different number for green, but where do .59, .11, .625, and .125 come from?

--
SolarStrike Software - MicroMacro home - Automation software.

Marcello
Member #1,860
January 2002
avatar

They come from empirical testing. r * 0.299f + g * 0.587f + b * 0.114f is what Photoshop uses, from what I've read.

miran
Member #2,407
June 2002

Quote:

They come from empirical testing. r * 0.299f + g * 0.587f + b * 0.114f is what Photoshop uses, from what I've read.

That's the standard set by some standard setting organization whose name is a French acronym I can't remember right now. .625, .5, .25, .125, etc. come from the fact that multiplying a number with them can be done with bitshifts.

--
sig used to be here

HoHo
Member #4,534
April 2004
avatar

For optimizing the loops a bit more, a good thing would be to do the operations on several bitmap lines in one cycle, probably 2 or 4 should be good options. That way compiler will probably mix those functions and CPU can execute several of those in parallel.
If done correctly I wouldn't be surprised if you would get two or more times faster execution with relatively modern CPU with good OoO execution :)

__________
In theory, there is no difference between theory and practice. But, in practice, there is - Jan L.A. van de Snepscheut
MMORPG's...Many Men Online Role Playing Girls - Radagar
"Is Java REALLY slower? Does STL really bloat your exes? Find out with your friendly host, HoHo, and his benchmarking machine!" - Jakub Wasilewski

Neil Walker
Member #210
April 2000
avatar

For a lot of colour effects I find converting to HSL much easier. One thing I've used in a game is convert to grey-scale but keep the orangey/red colours.

Neil.
MAME Cabinet Blog / AXL LIBRARY (a games framework) / AXL Documentation and Tutorial

wii:0356-1384-6687-2022, kart:3308-4806-6002. XBOX:chucklepie

Dustin Dettmer
Member #3,935
October 2003
avatar

I would just save the double loop / modu operator speed headaches all together and just do one loop.

for(int y = 0; y < bmp->h; y++)
 for(int x = 0; x < bmp->w; x++)
  c = ((long *)bmp->line[y])[x];

becomes

for(int i = bmp->w * bmp->h - 1; i >=0; i--)
 c = ((long *)bmp->line[0])<i>;

Evert said:

(but they can probably be pipelined and executed together, someone else will know).

Can someone confirm or deny this?

A J
Member #3,025
December 2002
avatar

dustin, your algo is wrong, and potentially dangerous.
you should not use the line[] beyond a single line.
not to mention that you also start out-of-bounds

HoHo, im intersetsed in your theory behind why 2 lines at a time would work quicker, care to elaborate

___________________________
The more you talk, the more AJ is right. - ML

 1   2   3   4 


Go to: