|
Any ideas on how to speed this up? |
Madhura
Member #16,060
August 2015
|
This code creates a circle that blends more and more as it radiates out from the center. It then uses additive blending with the background. I have done what I can think of to speed this code up and it is really fast now BUT since this is one of the main bottle neck I wanted to squeeze whatever else I can out of it. I calculated some of the math ahead of time on program load up with the array "array63_n[y][x]=n" to get the heavy math out of the loop. The array contains the raw shading values before being combined with the background. it currently computes half a circle of values. I only desire to speed things up for the function "color_change_bullet5_red". 1
2 for (y = 0; y <32; ++y)
3 {
4 for (x = 0; x <64; ++x)
5 {
6
7 dx=x-31;
8 dy=y-31;
9 dist_squared=sqrt(dx * dx + dy * dy);
10 n=outer_range-(9*dist_squared);
11
12 if(dist_squared>4 && dist_squared<32 && n>0)
13 {
14 n=237 * n * .0039;
15 array63_n[y][x]=n;//set value of partial equation to remove math from loop
16 }
17
18 }
19 }
20
21void color_change_bullet5_red(BITMAP *bmp, SPRITE *spr)
22{
23 int r=0;
24 unsigned int x=0;
25 unsigned int y=0;
26 int temp;
27 int n=0;
28 blit(buffer, bmp, spr->x, spr->y, 0, 0, 62, 62);//grabs background from game at x,y location
29 for (y = 1; y <31; y++)
30 {
31 unsigned long address = bmp_write_line(bmp, y);
32 unsigned long address_2 = bmp_write_line(bmp, 61-y);
33 unsigned int z;
34 for (x = 1; x<62; x++)
35 {
36 n=array63_n[y][x];
37 if(n!=0)
38 {
39 z=x<<1;
40 temp = bmp_read16(address+z);
41 r=getr16(temp)+n;
42 r = MIN(r, 255);
43 bmp_write16(address+z, makecol16(r, getg16(temp), getb16(temp)));
44
45 temp = bmp_read16(address_2+z);
46 r=getr16(temp)+n;
47 r = MIN(r, 255);
48 bmp_write16(address_2+z, makecol16(r, getg16(temp), getb16(temp)));
49 }
50 }
51 }
52 bmp_unwrite_line(bmp);
53 fblend_trans(bmp,buffer, spr->x, spr->y, 255);
54}
I unrolled the inner loop to do all four corners instead of just doing the top and bottom but the code actually slowed down just a tad. The shading value is symmetrical around the circle. The slow down is slight. This was the change I made that did not increase speed. Inner loop was halved and 2 more segments of BMP_write16 added to the above code. temp = bmp_read16(address_2+z); Any ideas? Hacky type of stuff is welcome. This is for windows mingw. |
Bruce Perry
Member #270
April 2000
|
Unless I'm much mistaken, your color_change_bullet5_red function captures screen contents, blends the sphere into it manually, then calls an fblend function to write it back. Why not either blit it back (faster), or alternatively create your sprite on a black background and then fblend it back (possibly faster still)? Be wary of extracting the maths into a lookup table. On all but the oldest computers, causing a memory cache miss is likely to be far more costly than doing some extra maths. With that said, if you're willing to change the aesthetics a bit, you could consider eliminating the sqrt and just using dx*dx+dy*dy directly. If you do that, then you open up the following kind of possibility (I've been arbitrary with what x starts and ends at, sorry): int dx_at_x_0 = 0; int d_at_x_0 = dy*dy + dx_at_x_0*dx_at_x_0; int d = d_at_x_0; for (int dx = 0; dx < 64; dx++) { ... stuff goes here ... //d += (dx+1)*(dx+1) - dx*dx; //d += dx*dx+2*dx+1 - dx*dx; d += 2*dx+1; } (You can optimise that further in the same way by also putting 2*dx+1 into a variable and doing +=2 to it each time.) You can still do this with your current code (if it weren't for the lookup table), and then apply the sqrt to each pixel, but as maths goes, sqrt is a bit costly compared to the rest. Another idea is to remove the dependence on makecol16 and friends from your main algorithm, and do some postprocessing that brings it in line with what the graphics card expects. I'm assuming makecol16 has to have different behaviour depending on the graphics card (RGB vs BGR), and that may not actually be the case, in which case you're unlikely to gain much if anything this way. Of all these ideas, I think the one I'd look into first and foremost is whether fblend can do your blending for you and you just need to poke pixels. [EDIT] -- |
Madhura
Member #16,060
August 2015
|
I did that direct video memory code thing for the pixel reading and writing. It reads in the address of an entire line at a time. I originally had all the math at the start or in the loop where needed and then improved the math as much as I could. I then began offloading parts of the math to the array and saw great speed increases. So even with cache misses there is such a load of math contained in that single n result from the table it appears to be worth it empirically speaking. At the end of the day it is just additive blending. I started out trying to use the blending routines but I always ended up with some weird side effect. A key word in what you talked about was how black has no effect on the blend. I bet I always had pink and the background in what I blended. I will make another attempt at the additive blend functions with the black background and see what happens. Thanks EDIT I am happy that I got this close to what Fblend could do. Of course I had lots of extra constructs gobbling up resources to get that close. Thanks again. |
anto80
Member #3,230
February 2003
|
Hmm sorry, but are you using Allegro 4.x or 5.x ? In Allegro 4.x, I agree with you, access to direct video memory is the fastest way (like in the fire effect in old Allegro 3.1x code examples) In Allegro 5.0.x, I would opt to : ___________ |
Bruce Perry
Member #270
April 2000
|
You're welcome. It's Allegro 4. In Allegro 5, I'd have said use a shader and be done with it -- |
anto80
Member #3,230
February 2003
|
Maybe if you want to avoid using "for"-loops, you may use memmove/memcpy. But the result may be slightly different (I can't remember in which order R G B 16bit values are stored in memory.) Btw shaders are reserved to 5.1.x users (what I was proposing is 5.0.x related) ___________ |
|