Any ideas on how to speed this up?

Any ideas on how to speed this up?

Madhura

Member #16,060

August 2015

This code creates a circle that blends more and more as it radiates out from the center. It then uses additive blending with the background. I have done what I can think of to speed this code up and it is really fast now BUT since this is one of the main bottle neck I wanted to squeeze whatever else I can out of it. I calculated some of the math ahead of time on program load up with the array "array63_n[y][x]=n" to get the heavy math out of the loop. The array contains the raw shading values before being combined with the background. it currently computes half a circle of values. I only desire to speed things up for the function "color_change_bullet5_red".

#SelectExpand
  1
for (y = 0; y <32; ++y)
{
  for (x = 0; x <64; ++x)
  {
  6
      dx=x-31;
      dy=y-31;
      dist_squared=sqrt(dx * dx + dy * dy);
      n=outer_range-(9*dist_squared);
 11
      if(dist_squared>4 && dist_squared<32 && n>0)
      {
        n=237 * n * .0039;
        array63_n[y][x]=n;//set value of partial equation to remove math from loop
      }
 17
  }
}
 20
 21void color_change_bullet5_red(BITMAP *bmp, SPRITE *spr)
 22{
int r=0;
unsigned int x=0;
unsigned int y=0;
int temp;
int n=0;
blit(buffer, bmp, spr->x, spr->y, 0, 0, 62, 62);//grabs background from game at x,y location
  for (y = 1; y <31; y++)
  {
    unsigned long address = bmp_write_line(bmp, y);
    unsigned long address_2 = bmp_write_line(bmp, 61-y);
    unsigned int z;
    for (x = 1; x<62; x++)
    {
      n=array63_n[y][x];
      if(n!=0)
      {
        z=x<<1;
        temp = bmp_read16(address+z);
        r=getr16(temp)+n;
        r = MIN(r, 255);
        bmp_write16(address+z, makecol16(r, getg16(temp), getb16(temp)));
 44
        temp = bmp_read16(address_2+z);
        r=getr16(temp)+n;
        r = MIN(r, 255);
        bmp_write16(address_2+z, makecol16(r, getg16(temp), getb16(temp)));
      }
    }
  }
  bmp_unwrite_line(bmp);
  fblend_trans(bmp,buffer, spr->x, spr->y, 255);
 54}

I unrolled the inner loop to do all four corners instead of just doing the top and bottom but the code actually slowed down just a tad. The shading value is symmetrical around the circle. The slow down is slight. This was the change I made that did not increase speed. Inner loop was halved and 2 more segments of BMP_write16 added to the above code.
for (x = 1; x<31; x++)
{
z=(61-x)<<1;
temp = bmp_read16(address+z);
r=getr16(temp)+n;
r = MIN(r, 255);
bmp_write16(address+z, makecol16(r, getg16(temp), getb16(temp)));

temp = bmp_read16(address_2+z);
r=getr16(temp)+n;
r = MIN(r, 255);
bmp_write16(address_2+z, makecol16(r, getg16(temp), getb16(temp)));

Any ideas? Hacky type of stuff is welcome. This is for windows mingw.

Bruce Perry

Member #270

April 2000

Unless I'm much mistaken, your color_change_bullet5_red function captures screen contents, blends the sphere into it manually, then calls an fblend function to write it back. Why not either blit it back (faster), or alternatively create your sprite on a black background and then fblend it back (possibly faster still)?

Be wary of extracting the maths into a lookup table. On all but the oldest computers, causing a memory cache miss is likely to be far more costly than doing some extra maths.

With that said, if you're willing to change the aesthetics a bit, you could consider eliminating the sqrt and just using dx*dx+dy*dy directly. If you do that, then you open up the following kind of possibility (I've been arbitrary with what x starts and ends at, sorry):

int dx_at_x_0 = 0;
int d_at_x_0 = dy*dy + dx_at_x_0*dx_at_x_0;

int d = d_at_x_0;
for (int dx = 0; dx < 64; dx++) {
    ... stuff goes here ...
    //d += (dx+1)*(dx+1) - dx*dx;
    //d += dx*dx+2*dx+1 - dx*dx;
    d += 2*dx+1;
}

(You can optimise that further in the same way by also putting 2*dx+1 into a variable and doing +=2 to it each time.)

You can still do this with your current code (if it weren't for the lookup table), and then apply the sqrt to each pixel, but as maths goes, sqrt is a bit costly compared to the rest.

Another idea is to remove the dependence on makecol16 and friends from your main algorithm, and do some postprocessing that brings it in line with what the graphics card expects. I'm assuming makecol16 has to have different behaviour depending on the graphics card (RGB vs BGR), and that may not actually be the case, in which case you're unlikely to gain much if anything this way.

Of all these ideas, I think the one I'd look into first and foremost is whether fblend can do your blending for you and you just need to poke pixels.

[EDIT]
Oops, I see - the amount of blending varies by pixel! Check if fblend is capable of reading a second bitmap as alpha or something like that, but if that's not an option, at least you could blit instead of fblending at the end.
[EDIT #2]
No wait, what you're doing looks like additive blending, no alpha required. You could get the necessary effect simply by drawing your circle on a black background and then using an additive blender. Where the sprite is black, the background won't change, and increasing brightness pixels will have an increasing effect on the background.

--
Bruce "entheh" Perry [ Web site | DUMB | Set Up Us The Bomb !!! | Balls ]
Programming should be fun. That's why I hate C and C++.
The brxybrytl has you.

Madhura

Member #16,060

August 2015

I did that direct video memory code thing for the pixel reading and writing. It reads in the address of an entire line at a time.

I originally had all the math at the start or in the loop where needed and then improved the math as much as I could. I then began offloading parts of the math to the array and saw great speed increases. So even with cache misses there is such a load of math contained in that single n result from the table it appears to be worth it empirically speaking.

At the end of the day it is just additive blending. I started out trying to use the blending routines but I always ended up with some weird side effect. A key word in what you talked about was how black has no effect on the blend. I bet I always had pink and the background in what I blended. I will make another attempt at the additive blend functions with the black background and see what happens.

Thanks

EDIT
Sure enough the black background made the additive FBlend function work correctly. I got a 10% speed increase over my pixel by pixel method. I also can dump the arrays since a simple image will be the storage medium for the values. I am glad that you looked past the code improvement to changing the algorithm itself.

I am happy that I got this close to what Fblend could do. Of course I had lots of extra constructs gobbling up resources to get that close.

Thanks again.

anto80

Member #3,230

February 2003

Hmm sorry, but are you using Allegro 4.x or 5.x ?

In Allegro 4.x, I agree with you, access to direct video memory is the fastest way (like in the fire effect in old Allegro 3.1x code examples)

In Allegro 5.0.x, I would opt to :
- draw bitmap regions if you want to do on-the-fly drawings.
- or lock bitmap / unlock bitmap if you want to preprocess your image and store it for future drawings.

___________
Currently working on his action/puzzle game CIPHER PUSHER : Blocks/Vortexes/Seafood! Facebook - Twitter - webpage

Bruce Perry

Member #270

April 2000

You're welcome.

It's Allegro 4. In Allegro 5, I'd have said use a shader and be done with it

--
Bruce "entheh" Perry [ Web site | DUMB | Set Up Us The Bomb !!! | Balls ]
Programming should be fun. That's why I hate C and C++.
The brxybrytl has you.

anto80

Member #3,230

February 2003

Maybe if you want to avoid using "for"-loops, you may use memmove/memcpy. But the result may be slightly different (I can't remember in which order R G B 16bit values are stored in memory.)

Btw shaders are reserved to 5.1.x users (what I was proposing is 5.0.x related)

___________
Currently working on his action/puzzle game CIPHER PUSHER : Blocks/Vortexes/Seafood! Facebook - Twitter - webpage