alpha-blending .. the fastest way?

alpha-blending .. the fastest way?

fsFreak

Member #1,017

February 2001

I'm doing semi-transparent windows in my game, and i just wonder if i can optimize this code.. cause it kinda slows down the FPS when drawing many windows at the same time ...
i'm using 16bit colormode

code:
void WINDOW::draw_16(int x, int y, int width, int height,
int color, int alpha_start, int alpha_end)
{
int virtual_offset = x + y * 640;
int virtual_offset_add = 640 - width;
short X;
short Y;

int r;
int g;
int b;

int new_r;
int new_g;
int new_b;
float alpha_div;
float alpha = alpha_start;
float alpha_factor = ((float)alpha_end - (float)alpha_start) / (float)height;
// default-values for a good-looking blue-blended window
// int alpha = 100;
// int color = makecol16(0,0,255);
// extract rgb components
new_r = getr (color);
new_g = getg (color);
new_b = getb (color);
// lock bitmap
acquire_bitmap (virtualscreen);
// get pointer
short dst = (short) virtualscreen->dat;
// process all pixels
for(Y = 0; Y < height; Y++)
{
for(X = 0; X < width; X++)
{
r = getr(dst[virtual_offset]);
g = getg(dst[virtual_offset]);
b = getb(dst[virtual_offset]);
alpha_div = alpha / 255;
dst[virtual_offset++] = makecol(
r + (new_r - r) * alpha_div,
g + (new_g - g) * alpha_div,
b + (new_b - b) * alpha_div);
}
alpha += alpha_factor;
virtual_offset += virtual_offset_add;
}
// release bitmap
release_bitmap (virtualscreen);

// draw border
rect(virtualscreen, x, y, x + width - 1, y + height - 1, makecol(120, 120, 120));
rect(virtualscreen, x+1, y+1, x + width - 2, y + height - 2, makecol(255, 255, 255));
rect(virtualscreen, x+2, y+2, x + width - 3, y + height - 3, makecol(120, 120, 120));
}

any idea how to speed things up?

23yrold3yrold

Member #1,134

March 2001

I'm not sure, but should you be releasing the bitmap before those calls to rect()? Try moving that release_bitmap() call to after the rect()'s.

Of course, going pixel by pixel as you seem to be is going to be dog slow in any case ....

--
Software Development == Church Development
Step 1. Build it.
Step 2. Pray.

fsFreak

Member #1,017

February 2001

yeah of course .. releasing after the rect should do something .. how silly of me .. but anyhow, is there a better (faster ) way than going pixel by pixel?

Gabhonga

Member #1,247

February 2001

yeah. in 16bit mode you could read and write 2 pixels at once, process both, and then store them both within a single 32bit store. and forget about all the creepy allegro functions like getr(): you'll need one routine for every colordepth...extract the color components manually (or by copy&paste other extraction code, for example allegro's).
division is also a slow operation, the way I would blend 2 24 bit colors:
a1 = alpha^0xff; a2 = alpha; //alpha ranges 0..255
colresultr = ((col1r*a1)>>8) + ((col2g*a2)>>8);
colresultg = ((col1b*a1)>>8) + ((col2*a2)>>8);
colresultb = ((col1*a1)>>8) + ((col2*a2)>>8);
I have even faster code for 16 bit blending, but not on this pc. reduce the number of multiplications to 2 by packing the 3 color components in a special manner into a single long int so I can multiplicate them with a scalar within one multiplication (and add all 3 channels together in one addition).
I assume this code is still a little portable, at least around >= 386 8086 family, but you could even go faster by using mmx or sse assembly routines.

--------------------------------------------------------
sigs suck

Fladimir da Gorf

Member #1,565

October 2001

OK, but hey, plz ppl tell me about mmx and sse!! I don't really know how they work ( or I have some idea but I'm sure it's not as powerful thingy as u can offer ) So does anyone know something about them??

OpenLayer has reached a random SVN version number ;) | Online manual | Installation video!| MSVC projects now possible with cmake | Now alvailable as a Dev-C++ Devpack! (Thanks to Kotori)

Bob

Free Market Evangelist

September 2000

<shameless self promotion>
You should use FBlend!
It uses MMX and SSE to acheive very fast blending in 15 and 16bpp. The trans blender is what you need.
</shameless>

--
- Bob
[ -- All my signature links are 404 -- ]

orz

Member #565

August 2000

your inner loop has this code:
code:
r = getr(dst[virtual_offset]);
g = getg(dst[virtual_offset]);
b = getb(dst[virtual_offset]);
alpha_div = alpha / 255;
dst[virtual_offset++] = makecol(
r + (new_r - r) * alpha_div,
g + (new_g - g) * alpha_div,
b + (new_b - b) * alpha_div);

some changes to make:
1. The most obvious: move the line "alpha_div = alpha / 255;" out of the
inner loop. A good compiler might do this for you in this example, but in more
complicated examples the compiler might not have enough information.
2. Nit-picky details... since you seem to be hardwired to 16bpp anyway (though that
could also work on 15bpp...), you can replace the getr, getg, getb, and makecol
with their 16bpp specific versions.
3. Maybe I'm confused, because it's been a little while since I've done much with
Allegro, but I think that your acquire_bitmap and release_bitmap pair
aren't doing much... they're supposed to only do things if it's a video bitmap, and
your direct memory stuff won't work on video bitmaps anyway, so they are never
useful in this code. If you want to make this able to write to video bitmaps you need
to do extra stuff like bmp_select, bmp_write_line, bmp_write16, etc.
4. In that loop you multiply by alpha_div 3 times... which isn't too expensive... but
using the results in a makecol() call converts them to ints right afterwards. You
might be better off starting with ints, and doing something like this instead:
code:
//second to innermost loop
int new_r2, new_g2, new_b2, alpha2;
new_r2 = (int)(new_r * 256 * alpha / 255.0);
new_g2 = (int)(new_g * 256 * alpha / 255.0);
new_b2 = (int)(new_b * 256 * alpha / 255.0);
alpha2 = (int)((255-alpha) * (256.0/255.0))
//now the innermost loop
for(X = 0; X < width; X++)
{
r = getr16(dst[virtual_offset]);
g = getg16(dst[virtual_offset]);
b = getb16(dst[virtual_offset]);
dst[virtual_offset++] = makecol16(
(new_r2 + r * alpha2) >> 8,
(new_g2 + g * alpha2) >> 8,
(new_b2 + b * alpha2) >> 8);
}

actually, on second thought... maybe that's kinda complicated... I think I premultiplied
the new color, used a little fixed point math, and got everything into integers
before the inner loop. I haven't tested this code... (BTW, the >>8 just means a
division by 256... it's really fast, but it only works on integers... it also rounds
slightly differently than the normal division if the integers are negative, but
that shouldn't happen here)
5. Anyway, it can be done faster with MMX (I don't know SSE, but maybe that also), but
you're probably better off just using FBlend or whatever.
[ October 17, 2001: Message edited by: orz ]
[ October 17, 2001: Message edited by: orz ]

Korval

Member #1,538

September 2001

Why are you doing alpha-blending in software anyway? Why not let the hardware handle it?

fsFreak

Member #1,017

February 2001

Thanks a lot for all intelligent answers ...
I'll get right to the optimizations
korval:
uh.. do all cards support it? if so, how do I do it? is it in the allegro docs?
Graphicprogramming isn't really my table :/

Bob

Free Market Evangelist

September 2000

Koval: to use hardware translucency, he'd have to switch to 3D (either OpenGL or Direct3D), both of which are a pain to use if you want to do regular 2D stuff. Simply have a look at AllegroGL's Allegro driver. We try to implement as many Allegro functions as we can, but soem of them are just too anoying (floodfill anyone?)

--
- Bob
[ -- All my signature links are 404 -- ]

Korval

Member #1,538

September 2001

I assumed that hardware blitting would support alpha transparency if avaliable.

And, I also assumed that if it didn't, then Allegro would use an assembly optimized blitting loop.

Gabhonga

Member #1,247

February 2001

for coding for mmx or sse you need to learn assembler, and that's maybe a bit too long topic for this messageboard. but anyways, here's some (at least faster than yours) 16bit c(++) blending loop utilizing the packing operations I spoke of above:

#SelectExpand
  1#define dword unsigned long
  2void unpack(dword dizone)
  3{ dizone = ((dizone&0x7E0)<<16) | (diyone&0xF81F); }
  4void repack(ulong datone)
  5{ datone = (datone&0xFFFF) | ((datone>>16)&0x7E0); }
  6void cleanpack(ulong wichone)
  7{ wichone &= 0x7E0F81F; }
  8//blends a mem bitmap onto another
  9//both must have same size
 10//alpha range 0..255
 11void blend16noclip(BITMAP* from, BITMAP* to, dword x, dword y, dword alpha)
 12{ void* pf = ((void*)from->line[0]);
void* pt = ((void*)to->line[0]);
dword ecx = from->h*from->w;  alpha >>= 3;  //more than 5 bits precision would overflow
dword recal = alpha ^ 0x1F; //recal = 255-alpha
if(pt&3) //at least eleminate misalligned writes, for misallignment check between pf and pt this routine would be even bigger
{ dword c1 = *((short*)pf);
  unpack(c1);
  c1*=alpha;c1>>=5;
  c2 = *((short*)pt);
  unpack(c2);
  c2*=recal;c2>>=5;
  cleanpack(c1);cleanpack(c2);
  c1+=c2; repack(c1);
  *((short*)pt)=c1;
  pt+=2;pf+=2; }
for(ecx;ecx>1;ecx-=2) //...lol...
{ dword c1=*((dword*)pf); //fetch two pixels. the first is in the lower word, the second in the upper
  dword c2=c1>>16;
  unpack(c1);
  c1*=alpha;c1>>=5;
  cleanpack(c1);
  unpack(c2);
  c2*=alpha;c2>>=5;
  cleanpack(c2);
  repack(c1);
  repack(c2);
  c1|=c2<<16;
  c2=*((dword*)pt); //fetch dest pixels and scale inversed
  dword c3=c2>>16;
  unpack(c2);
  c2*=recal;c2>>=5;
  cleanpack(c2);
  unpack(c3);
  c2*=recal;c3>>=5;
  cleanpack(c3);
  repack(c2);
  repack(c3);
  c2|=c3<<16;
  c1+=c2; //add source and dest
  *((dword*)pt) = c1; //store 2 pixels back
  pt+=4;pf+=4; //advance both pointers by 4 bytes
}
if(ecx) //still a pixel left
{ dword c1 = *((short*)pf);
  unpack(c1);
  c1*=alpha;c1>>=5;
  c2 = *((short*)pt);
  unpack(c2);
  c2*=recal;c2>>=5;
  cleanpack(c1);cleanpack(c2);
  c1+=c2; repack(c1);
  *((short*)pt)=c1; }
 64}

pheew, I really hope there's no error inside, because I've just hacked it in, but I'm quite sure it works, I've already tested lots of similliar code.
anyway, this should give you some more little ideas about optimizing such routines.
~~gabhonga~~

--------------------------------------------------------
sigs suck

kdevil

Member #1,075

March 2001

This thread is a nightmare to read.

-----
"I am the Black Mage! I casts the spells that makes the peoples fall down!"

Bob

Free Market Evangelist

September 2000

orz (or matthew?), would you mind inserting line breaks in your code?
Thanks.

--
- Bob
[ -- All my signature links are 404 -- ]

orz

Member #565

August 2000

1. If there's a way for me to edit my post, I don't see it.
2. Um... when I wrote that message, there were a few more line breaks in it,
at least in the code section at the top...
It would be nice if the editor displayed things with some resemblance to how they are
going to appear. Or at least offered a preview button.

Bob

Free Market Evangelist

September 2000

There's a little icon on the top left that looks like a crayon with a paper.

--
- Bob
[ -- All my signature links are 404 -- ]

orz

Member #565

August 2000

I see a message icon at the top left of each post, but it's not a link.
I see a few more icons to the right of that, one of which looks like a pencil with a paper,
but that's the reply button... nothing else I see looks anything like that.

Trumgottist

Member #95

April 2000

Yep. The pen and paper icon right next to the profile and private message icons is the one. It's for editing, not replying - try it. (Or simply move the mouse over it to get a pop up description of it: "Edit/Delete Post" At least in IE that's true.)

--
"I always prefer to believe the best of everybody - it saves so much time." - Rudyard Kipling

Play my game: Frasse and the Peas of Kejick

orz

Member #565

August 2000

yes, I feel stupid now. I coulda sworn I tried that button several and used it to reply...
anyway, I've fixed the obnoxious formating of my post.