|
Greyscale-Filter is too slow. How to improve performance? |
DieBacke
Member #5,548
February 2005
|
hi,
it is very slow... when I try it with lower resolutions, it is much smoother (1024*768 compared to 640*480 ). So the part in the loop is not so well done by me... There must be a critical mistake, otherwise such a simple filter wouldn't lag so bad... If I don't call acquire_bitmap() the performance doesn't get worse... shouldn't that lower fps to almost zero? thx, ~~Backe
|
miran
Member #2,407
June 2002
|
Don't use generic getpixel, putpixel, makecol and getX functions! Use the colour depth specific ones instead. Actually you should access bitmap memory directly. That'll make your code several times faster. EDIT: But at 1024x768 that will still be slow. You can probaly use OpenLayer to get the same effect. I've no idea how though... -- |
Kitty Cat
Member #2,815
October 2002
|
What type of bitmap is being passed? A memory bitmap? Video? If you know the depth, you can use the functions directly: _getpixel32 // note the underscore! _putpixel32 // note the underscore! getr32 getg32 getb32 makecol32 Replace 32 with the depth you're using. EDIT: bmp_select(bmp); addr = bmp_read_line(bmp, y); for(x) buf[x] = bmp_read32(addr+(x*4)); do_stuff_to_line(buf); addr = bmp_write_line(bmp, y); for(x) bmp_write32(addr+(x*4), buf[x]); Check the exflame example for methods of writing directly to bitmaps. -- |
FrankyR
Member #243
April 2000
|
For some info on how to access BITMAP data directly check out: http://www.allegro.cc/manual/api/direct-access-to-video-memory/ |
Evert
Member #794
November 2000
|
In addition to what was said above, swapping the order of your loops may also help due to the way the bitmaps are layed out in memory. |
gnolam
Member #2,030
March 2002
|
It should also be noted that the code you provided may not produce the result you want. In that case, you'll have to weight your color components: luminance -- |
DieBacke
Member #5,548
February 2005
|
the colour depth specific getpixel and putpixel functions and the ((long *)bmp->line[y])[x] crash my program in my IDE the program quits, and keeps the resolution, though. If I run the exe manually, the window just becomes black.
|
Richard Phipps
Member #1,632
November 2001
|
If it crashes then you are trying to read from a point outside the bitmap. |
Elverion
Member #6,239
September 2005
|
I also find that, instead of using nested for loops, using a single for loop can benefit speed more than you would think. That is, instead of: // maxa represents the max "width" // maxb represents the max "height" for(int b = 0; b < maxb; b++) for(int a = 0; a < maxa; a++) do_something(); use: // maxa represents the max "width" // maxb represents the max "height" for(int i = 0; i < (maxb * maxa); i++) { int b = i / maxa; // b represents vertical movement int a = i % maxa; // a represents horizontal movement do_something(); } Of course, this doesn't always work(mostly due to optimizations where possible), and you probably knew this anyways. I was thinking that, maybe, you could somehow use draw_lit_sprite to fake the whole grayscaling processes. I haven't used this function before, and don't know it's limitations. -- |
DieBacke
Member #5,548
February 2005
|
Quote: If it crashes then you are trying to read from a point outside the bitmap.
Yes, your right.
It still lags, but it's better.
|
Evert
Member #794
November 2000
|
Quote:
for(int i = 0; i < (bmp->w * bmp->h); i++) Store bmp->w*bmp->h, and copy bmp->w and bmp->h to local variables (right now, you're doing a pointer dereference in your inner loop, which is slower than accessing a local variable). The single loop might buy you some time because it safes one jump instruction at the end of the loop (the nested loops have two jumps, one for each loop), on the other hand, division and modulus are slow operations (but they can probably be pipelined and executed together, someone else will know). |
Richard Phipps
Member #1,632
November 2001
|
Try this:
It can be optimised further, but I wanted you to see what was happening first. |
Neil Walker
Member #210
April 2000
|
.3 * R + .59 * G + .11 * B Isn't that much slower in a computer that can do 5 trillion calculations a second. Neil. wii:0356-1384-6687-2022, kart:3308-4806-6002. XBOX:chucklepie |
Richard Phipps
Member #1,632
November 2001
|
True, but unless it needs to be accurate, the quick divide by 3 will look fine and be a little bit faster. |
Kitty Cat
Member #2,815
October 2002
|
c = ((long *)bmp->line[y])[x]; // Split into colour components. // Quick but inaccurate greyscale conversion. grey = (c >> 16) & 0xFF; grey += (c >> 8) & 0xFF; grey += c & 0xFF; grey /= 3; // Write the new value. ((long *)bmp->line[y])[x] = grey * ((1<<16) | (1<<8) | 1);
-- |
Richard Phipps
Member #1,632
November 2001
|
Yep, that's the optimised version. Although you could put the grey bit in the pixel write and get rid of the grey variable altogether. |
Jonatan Hedborg
Member #4,886
July 2004
|
A faster way would be to make a whole new set of bitmaps, which are greyscale. They could be automaticly generated on load and stored "together" (in the same class/struct) with their "parent" bitmap. Then you'd have some function which switch their pointers when you want to draw in greyscale.
|
Krzysztof Kluczek
Member #4,191
January 2004
|
Quote: .3 * R + .59 * G + .11 * B .25*R + .625*G + .125*B shouldn't be too much off. gray = ((color&0xFC0000)>>18) + ((color&0x00FE00)>> 9) + ((color&0x00F800)>>11) + ((color&0x0000F8)>> 3); gray|=(gray<<8)|(gray<<16); And if .25*R + .5*G + .25*B is fine for you: gray = ((color&0xFC0000)>>18) + ((color&0x00FE00)>> 9) + ((color&0x0000FC)>> 2); gray|=(gray<<8)|(gray<<16);
________ |
Elverion
Member #6,239
September 2005
|
Quote:
.3 * R + .59 * G + .11 * B What's with the seemingly random decimal points? I could understand multiplying them all by 0.3, or even using a slightly different number for green, but where do .59, .11, .625, and .125 come from? -- |
Marcello
Member #1,860
January 2002
|
They come from empirical testing. r * 0.299f + g * 0.587f + b * 0.114f is what Photoshop uses, from what I've read. |
miran
Member #2,407
June 2002
|
Quote: They come from empirical testing. r * 0.299f + g * 0.587f + b * 0.114f is what Photoshop uses, from what I've read. That's the standard set by some standard setting organization whose name is a French acronym I can't remember right now. .625, .5, .25, .125, etc. come from the fact that multiplying a number with them can be done with bitshifts. -- |
HoHo
Member #4,534
April 2004
|
For optimizing the loops a bit more, a good thing would be to do the operations on several bitmap lines in one cycle, probably 2 or 4 should be good options. That way compiler will probably mix those functions and CPU can execute several of those in parallel. __________ |
Neil Walker
Member #210
April 2000
|
For a lot of colour effects I find converting to HSL much easier. One thing I've used in a game is convert to grey-scale but keep the orangey/red colours. Neil. wii:0356-1384-6687-2022, kart:3308-4806-6002. XBOX:chucklepie |
ImLeftFooted
Member #3,935
October 2003
|
I would just save the double loop / modu operator speed headaches all together and just do one loop. for(int y = 0; y < bmp->h; y++) for(int x = 0; x < bmp->w; x++) c = ((long *)bmp->line[y])[x]; becomes for(int i = bmp->w * bmp->h - 1; i >=0; i--) c = ((long *)bmp->line[0])<i>;
Evert said: (but they can probably be pipelined and executed together, someone else will know). Can someone confirm or deny this? |
A J
Member #3,025
December 2002
|
dustin, your algo is wrong, and potentially dangerous. HoHo, im intersetsed in your theory behind why 2 lines at a time would work quicker, care to elaborate ___________________________ |
|
|