Allegro.cc - Online Community

Allegro.cc Forums » Programming Questions » Value Noise optimisation

This thread is locked; no one can reply to it. rss feed Print
Value Noise optimisation
SiegeLord
Member #7,827
October 2006
avatar

I am making a game where the users play in a randomly generated galaxy. I make it using a collection of brushes which I then perturb using a value noise function that I salvaged from this site.

Normally I wouldn't ask a question as broad as this, but I am reaching the limits of my abilities. I wish to have any comments on possible optimizations that can be done with the functions associated with the actual perturbation process.

I call the PerturbGalaxy function and it calls the rest of the functions as appropriate. It returns usually after ~40 seconds (all compiler optimizations turned on) on my 1.58 Ghz AMD.

1int primes[][3] =
2{
3 {15731, 789221, 1376312589},
4 {15271, 789533, 1376312489},
5 {15277, 789557, 1376312491},
6 {15287, 789571, 1376312501},
7 {15289, 789577, 1376312507},
8 {15299, 789587, 1376312513},
9 {15307, 789589, 1376312527},
10 {15313, 789611, 1376312543},
11 {15319, 789623, 1376312551},
12 {15329, 789631, 1376312579},
13 {15331, 789653, 1376312627},
14 {15349, 789671, 1376312629},
15 {15359, 789673, 1376312657},
16 {15361, 789683, 1376312687},
17 {15373, 789689, 1376312689},
18 {15377, 789709, 1376312753},
19 {15383, 789713, 1376312783},
20 {15391, 789721, 1376312789},
21 {15401, 789731, 1376312813},
22 {15413, 789739, 1376312857},
23 {15427, 789749, 1376312879}
24};
25 
26#define _GALAXY_RADIUS 1024
27#define _NUM_SEEDS 20
28 
29int g_nSeed = 0;
30 
31#define _RAND_TABLE_SIZE (0x000fffff)
32 
33float g_nRandTable[_RAND_TABLE_SIZE][2];
34 
35float prand2(int x, int y)
36{
37 int n = x + y * 57;
38 n = (n<<13) ^ n;
39 return ( 1.0 - ( (n * (n * n * primes[g_nSeed][0] + primes[g_nSeed][1]) + primes[g_nSeed][2]) & 0x7fffffff) / 1073741824.0);
40}
41 
42float prand(int x, int y)
43{
44 int n = unsigned int(x + y * 57) & (0x000fffff);
45 return g_nRandTable[n][g_nSeed];
46}
47 
48void psrand(int seed)
49{
50 int a = _RAND_TABLE_SIZE;
51 for(int ii = 0; ii < _RAND_TABLE_SIZE; ii++)
52 {
53 g_nSeed = seed % _NUM_SEEDS;
54 g_nRandTable[ii][0] = prand2(ii, 0);
55
56 g_nSeed = (seed + 1) % _NUM_SEEDS;
57 g_nRandTable[ii][1] = prand2(ii, 0);
58 }
59}
60 
61float SmoothNoise(float x, float y)
62{
63 float corners =
64 (prand(x - 1, y - 1) +
65 prand(x + 1, y - 1) +
66 prand(x - 1, y + 1) +
67 prand(x + 1, y + 1)) / 16;
68 float sides = (prand(x - 1, y) + prand(x + 1,y) + prand(x, y - 1) +prand(x, y + 1) ) / 8;
69 float center = prand(x, y) / 4;
70 return corners + sides + center;
71}
72 
73float Interpolate(float a, float b, float x)
74{
75 float f = x * x * (3 - 2 * x);//3x^2-2x^3
76 
77 return a + (b - a) * f;
78}
79 
80float InterpolatedNoise(float x, float y)//, bool first = false)
81{
82 float v1;
83 float v2;
84 float v3;
85 float v4;
86 
87 int integer_X = int(x);
88 float fractional_X = x - integer_X;
89 
90 int integer_Y = int(y);
91 float fractional_Y = y - integer_Y;
92 
93 v1 = SmoothNoise(integer_X, integer_Y);
94 v2 = SmoothNoise(integer_X + 1, integer_Y);
95 v3 = SmoothNoise(integer_X, integer_Y + 1);
96 v4 = SmoothNoise(integer_X + 1, integer_Y + 1);
97 
98 float i1 = Interpolate(v1 , v2 , fractional_X);
99 float i2 = Interpolate(v3 , v4 , fractional_X);
100 
101 return Interpolate(i1 , i2 , fractional_Y);
102}
103 
104 
105float PerlinNoise_2D(float x, float y, float persistence = 1.0f, int Number_Of_Octaves = 3)
106{
107 float total = 0.0f;
108 float p = persistence;
109 int n = Number_Of_Octaves - 1;
110 
111 float frequency = 1;
112 float amplitude = 1;
113 
114 for(int ii = 0; ii < n; ii++)
115 {
116 frequency = frequency * 2;
117 amplitude = amplitude * p;
118 
119 total = total + InterpolatedNoise(x * frequency, y * frequency) * amplitude;
120 }
121 return total;
122}
123 
124BITMAP* PerturbGalaxy(BITMAP* galaxy, int seed = 0, float factor = 40.0f, int tile_size = 128,
125 float persistence = .9, int octaves = 10)
126{
127 BITMAP* ret = create_bitmap_ex(8, galaxy->w, galaxy->h);
128 
129 psrand(seed);
130 
131 for(int y = 0; y < ret->h; y++)
132 {
133 for(int x = 0; x < ret->w; x++)
134 {
135 g_nSeed = 0;
136 int dispX = factor * PerlinNoise_2D(float(x) / tile_size, float(y) / tile_size, persistence, octaves) + x;
137 g_nSeed = 1;
138 int dispY = factor * PerlinNoise_2D(float(x) / tile_size, float(y) / tile_size, persistence, octaves) + y;
139 
140 if(dispX < 0)
141 dispX = 0;
142 if(dispY < 0)
143 dispY = 0;
144 
145 if(dispX >= ret->w)
146 dispX = ret->w - 1;
147 if(dispY >= ret->h)
148 dispY = ret->h - 1;
149 
150 ret->line[y][x] = galaxy->line[dispY][dispX];
151 }
152 }
153 destroy_bitmap(galaxy);
154 return ret;
155}

And for the curious here's what the end result looks like:

{"name":"590913","src":"\/\/djungxnpq2nug.cloudfront.net\/image\/cache\/9\/6\/96ad169bcd47a12eaf2f683fdc8b935b.jpg","w":817,"h":635,"tn":"\/\/djungxnpq2nug.cloudfront.net\/image\/cache\/9\/6\/96ad169bcd47a12eaf2f683fdc8b935b"}590913

Also, a related question, would you think that 40secs is too much to wait for a map to be created? It would be done before every match...

Thanks.

"For in much wisdom is much grief: and he that increases knowledge increases sorrow."-Ecclesiastes 1:18
[SiegeLord's Abode][Codes]:[DAllegro5]:[RustAllegro]

Bruce Perry
Member #270
April 2000

OK, this is going to be painful ... ;D

Keep the current copy in case you decide to go back to it.

I'd say start by inlining as many functions as you can. At the moment, the code is written very modularly with the lower levels quite generic; you can probably flatten it out and replace parameters with hardcoded values, unroll loops, and then remove unnecessary work. For example, I haven't looked closely but if you unroll that octave iteration, then the highest frequency in the Perlin noise will not need interpolating if the point spacing is one pixel.

Having done that, you'll find you have those pseudo-random numbers being calculated for every pixel and then interpolated.

First, arrange to calculate them on demand and then keep them around until you've finished with them. There will be runs of pixels within which every pixel is calculated using the same four points with increasing x-interpolation coefficients and constant y-interpolation coefficients. You should be able to get the four values once, precalculate the y-interpolation, and then just do the x-interpolation per pixel, using the same two values for every pixel in that run. This will enormously reduce the number of times you calculate those values.

It could get tricky if you try to do all these in one loop, so you might want to precalculate the noise value per pixel and do it in several passes. You could run through an array of floats and put the high-frequency noise in, then run through again and add in the mid-frequency noise using the above interpolation trick, then run through again and do the low-frequency noise - with as many passes as necessary. Having done all that, you may not even have to unroll the octave loop - except perhaps for the highest frequency if it does indeed not need interpolating!

Finally, when you do parabolic interpolation like what you've got there (parabolic means A*x^n + B*x^(n-1) + C*x^(n-2) + ... + D*x^2 + E*x + F), you can usually calculate some delta values (like differentiation but with discrete steps) and get rid of the multiplication. The trick is to work out an equation for how much the value of the interpolation function will change per pixel, so df = f(x+1) - f(x). If that is variable, work out how much THAT will change per pixel (ddf = df(x+1) - df(x)). Eventually you'll end up with a constant, and you'll have some code inside the loop for updating the value as follows:

f+=df;
df+=ddf;
ddf+=dddf;//this would be the last line if dddf turned out to be constant

You then just need to set f, df, etc. to their initial values before entering the loop, and bam, your per-pixel multiplications have gone.

Let us know if you want any steps clarifying or fleshing out.

Depending on the game, another trick you could do (though it might clash with the above optimisations) is to calculate a coarse image, set the game going, then refine the image over multiple steps - so the game can start even though it isn't fully formed.

Looks good by the way ;D

--
Bruce "entheh" Perry [ Web site | DUMB | Set Up Us The Bomb !!! | Balls ]
Programming should be fun. That's why I hate C and C++.
The brxybrytl has you.

SiegeLord
Member #7,827
October 2006
avatar

The inlining did not increase the speed much, only about 1-2 seconds of speedup. Your suggestion to remove the redundancies in calculation did move me to reexamine my pseudo-random table generation. I have modified it such that it would generate a smoothed noise from the start, and since the table has less entries than the galaxy bitmap has pixels, the smoothing convolution is done less number of times. The speed increase was substantial: the generation time is now ~16 seconds.

Thank you for your suggestions.

"For in much wisdom is much grief: and he that increases knowledge increases sorrow."-Ecclesiastes 1:18
[SiegeLord's Abode][Codes]:[DAllegro5]:[RustAllegro]

Jakub Wasilewski
Member #3,653
June 2003
avatar

I have an optimised implementation for something like that. It's not Perlin noise, but it is very similar - specifically, the final output is a sum of 2D random functions with varying frequency and amplitude. The only difference is that I don't use specific pseudo-random functions, but use the built-in (or other) RNG.

The workhorse function is provided below. It does one pass over the terrainData array, adding the values from a new function. The amplitude parameter specifies the largest displacement that could be added, and the "tileSize" parameter determines frequency. Basically, it's the distance betweeen two genuine random values, and everything in between is interpolated.

I use cosine interpolation, but you can easily change that and calculate different mu coefficients in the first few lines.

1void TerrainGen::onePass(int tileSize, float amplitude)
2{
3 // cosine interpolation coefficients
4
5 float *muVal = new float[tileSize];
6 float mu = 0.0, muStep = PI / tileSize;
7
8 for (int i = 0; i < tileSize; i++)
9 {
10 muVal<i> = 0.5 - std::cos(mu) * 0.5;
11
12 mu += muStep;
13 }
14 
15 int horizTileCount = (int)std::ceil((float)TERRAIN_WIDTH / tileSize) + 1;
16 float *topVals = new float[horizTileCount], *botVals = new float[horizTileCount], *temp;
17
18 for (int i = 0; i < horizTileCount; i++)
19 {
20 topVals<i> = randomFloat(-amplitude, +amplitude);
21 botVals<i> = randomFloat(-amplitude, +amplitude);
22 }
23
24 bool innerCont = true, outerCont = true;
25
26 int y = 0;
27 int x = 0, xTile = 0, xModulo;
28
29 float *adderPtr = &terrainData[0][0], *lineEndPtr, *dataEndPtr;
30 float leftValue, rightValue;
31
32 dataEndPtr = &terrainData[TERRAIN_HEIGHT][0];
33 do
34 {
35 for(int yModulo = 0; yModulo < tileSize && adderPtr != dataEndPtr; yModulo++, y++)
36 {
37 lineEndPtr = &terrainData[y+1][0];
38 xTile = 0; innerCont = true;
39
40 do
41 {
42 leftValue = topVals[xTile] + (botVals[xTile] - topVals[xTile]) * muVal[yModulo];
43 rightValue = topVals[xTile+1] + (botVals[xTile+1] - topVals[xTile+1]) * muVal[yModulo];
44
45 for(xModulo = 0; (xModulo < tileSize) && (adderPtr != lineEndPtr); xModulo++, x++)
46 (*adderPtr++) += leftValue + (rightValue - leftValue) * muVal[xModulo];
47
48 innerCont = adderPtr != lineEndPtr;
49 xTile++;
50
51 } while(innerCont);
52
53 outerCont = adderPtr != dataEndPtr;
54 }
55
56 temp = topVals; topVals = botVals; botVals = temp;
57 for (int i = 0; i < horizTileCount; i++)
58 botVals<i> = randomFloat(-amplitude, +amplitude);
59
60 } while(outerCont);
61
62 delete [] muVal;
63 delete [] topVals;
64 delete [] botVals;
65}

This could calculate several 640x480 with 64 base tile size (so, 8 iterations of onePass) landscapes in a second on my Athlon XP, if I recall correctly (including post-processing operations like ensuring the proper area under-water, generating the map bitmap and so on). You can optimize it further probably, especially if it is guaranteed that TERRAIN_WIDTH and TERRAIN_HEIGHT are always divisible by tileSize.

---------------------------
[ ChristmasHack! | My games ] :::: One CSS to style them all, One Javascript to script them, / One HTML to bring them all and in the browser bind them / In the Land of Fantasy where Standards mean something.

Bob
Free Market Evangelist
September 2000
avatar

Notice that you're calling prand() about 600 times for each pixel in your final image.

Instead, you may want to precompute the prand() for each (x,y) location you're interested in, then perform a (separable) convolution to sum up and scale each component of the noise.

That should greatly speed up the process.

--
- Bob
[ -- All my signature links are 404 -- ]

Sirocco
Member #88
April 2000
avatar

Quote:

Also, a related question, would you think that 40secs is too much to wait for a map to be created?

Yes, definitely. If a map can't be generated in under 5 seconds or so I'd be extremely disappointed. 40 seconds is too long to wait for anything when you're a gamer :)

As Bob mentioned all the random number calls are bogging things down. In instances where I've needed fast noise generation I've often employed short arrays of noise offsets and used a single random number call for each vertical line to determine which offset array to use. So instead of doing 307,200 prands for each frame you'd be doing either 640 or 480 (for a 640x480 screen).

-->
Graphic file formats used to fascinate me, but now I find them rather satanic.

SiegeLord
Member #7,827
October 2006
avatar

Okay, I made several more optimizations, and got the generation speed down to 6-7 seconds. I decided not to utilize the multi-pass approach, since my initial tests did not result in any speed increase... At this point, the generation time is to my liking, and I shall wait until more problems arise before optimizing further.

Here's my newest version of the associated code:

1#define _GALAXY_RADIUS 1024
2 
3int primes[][3] =
4{
5 {15731, 789221, 1376312589},
6 {15271, 789533, 1376312489},
7 {15277, 789557, 1376312491},
8 {15287, 789571, 1376312501},
9 {15289, 789577, 1376312507},
10 {15299, 789587, 1376312513},
11 {15307, 789589, 1376312527},
12 {15313, 789611, 1376312543},
13 {15319, 789623, 1376312551},
14 {15329, 789631, 1376312579},
15 {15331, 789653, 1376312627},
16 {15349, 789671, 1376312629},
17 {15359, 789673, 1376312657},
18 {15361, 789683, 1376312687},
19 {15373, 789689, 1376312689},
20 {15377, 789709, 1376312753},
21 {15383, 789713, 1376312783},
22 {15391, 789721, 1376312789},
23 {15401, 789731, 1376312813},
24 {15413, 789739, 1376312857},
25 {15427, 789749, 1376312879}
26};
27 
28#define _NUM_SEEDS 20
29 
30int g_nSeed = 0;
31 
32#define _RAND_TABLE_SIZE (0x000fffff)
33 
34float g_nRandTable[_RAND_TABLE_SIZE][2];
35 
36float prand2(int x, int y)
37{
38 int n = x + y * 57;
39 n = (n<<13) ^ n;
40 return ( 1.0 - ( (n * (n * n * primes[g_nSeed][0] + primes[g_nSeed][1]) + primes[g_nSeed][2]) & 0x7fffffff) / 1073741824.0);
41}
42 
43void psrand(int seed)
44{
45 int a = _RAND_TABLE_SIZE;
46 
47 float *temp = new float[_RAND_TABLE_SIZE * 2];
48 
49 for(int ii = 0; ii < _RAND_TABLE_SIZE; ii++)
50 {
51 g_nSeed = seed % _NUM_SEEDS;
52 temp[2 * ii] = prand2(ii, 0);
53
54 g_nSeed = (seed + 1) % _NUM_SEEDS;
55 temp[2 * ii + 1] = prand2(ii, 0);
56 }
57 
58 for(int ii = 0; ii < _RAND_TABLE_SIZE; ii++)
59 {
60 float sub_total1 = 0.0f;
61 float sub_total2 = 0.0f;
62 float total1 = 0.0f;
63 float total2 = 0.0f;
64 
65 //sides
66 int index = unsigned int(ii - 1) & _RAND_TABLE_SIZE;
67 sub_total1 += temp[index * 2];
68 sub_total2 += temp[index * 2 + 1];
69 
70 index = unsigned int(ii - 57) & _RAND_TABLE_SIZE;
71 sub_total1 += temp[index * 2];
72 sub_total2 += temp[index * 2 + 1];
73 
74 index = unsigned int(ii + 57) & _RAND_TABLE_SIZE;
75 sub_total1 += temp[index * 2];
76 sub_total2 += temp[index * 2 + 1];
77 
78 index = unsigned int(ii + 1) & _RAND_TABLE_SIZE;
79 sub_total1 += temp[index * 2];
80 sub_total2 += temp[index * 2 + 1];
81
82 total1 += sub_total1 * 0.125f;
83 total2 += sub_total2 * 0.125f;
84 
85 sub_total1 = 0;
86 sub_total2 = 0;
87 
88 //corners
89 
90 index = unsigned int(ii - 1 - 57) & _RAND_TABLE_SIZE;
91 sub_total1 += temp[index * 2];
92 sub_total2 += temp[index * 2 + 1];
93 
94 index = unsigned int(ii - 1 + 57) & _RAND_TABLE_SIZE;
95 sub_total1 += temp[index * 2];
96 sub_total2 += temp[index * 2 + 1];
97 
98 index = unsigned int(ii + 1 + 57) & _RAND_TABLE_SIZE;
99 sub_total1 += temp[index * 2];
100 sub_total2 += temp[index * 2 + 1];
101 
102 index = unsigned int(ii + 1 - 57) & _RAND_TABLE_SIZE;
103 sub_total1 += temp[index * 2];
104 sub_total2 += temp[index * 2 + 1];
105 
106 total1 += sub_total1 * 0.0625f;
107 total2 += sub_total2 * 0.0625f;
108 
109 total1 += temp[index * 2] * 0.25f;
110 total2 += temp[index * 2 + 1] * 0.25f;
111 
112 g_nRandTable[index][0] = total1;
113 g_nRandTable[index][1] = total2;
114 }
115 delete[] temp;
116}
117 
118float InterpolatedNoise(float x, float y)
119{
120 
121 float v1;
122 float v2;
123 float v3;
124 float v4;
125 
126 int integer_X = int(x);
127 float fractional_X = x - integer_X;
128 
129 int integer_Y = int(y);
130 float fractional_Y = y - integer_Y;
131 
132 int n = integer_X + integer_Y * 57;
133 
134 v1 = g_nRandTable[n & _RAND_TABLE_SIZE][g_nSeed];
135 v2 = g_nRandTable[(n + 1) & _RAND_TABLE_SIZE][g_nSeed];
136 v3 = g_nRandTable[(n + 57) & _RAND_TABLE_SIZE][g_nSeed];
137 v4 = g_nRandTable[(n + 58) & _RAND_TABLE_SIZE][g_nSeed];
138 
139 float i1 = v1 + (v2 - v1) * fractional_X;
140 float i2 = v3 + (v4 - v3) * fractional_X;
141 
142 return i1 + (i2 - i1) * fractional_Y;
143}
144 
145 
146float PerlinNoise_2D(float x, float y, float persistence = 1.0f, int Number_Of_Octaves = 3)
147{
148 float total = 0.0f;
149 float p = persistence;
150 int n = Number_Of_Octaves - 1;
151 
152 float frequency = 1;
153 float amplitude = 1;
154 
155 for(int ii = 0; ii < n; ii++)
156 {
157 frequency *= 2;
158 amplitude *= p;
159 
160 total += InterpolatedNoise(x * frequency, y * frequency) * amplitude;
161 }
162 return total;
163}
164 
165BITMAP* PerturbGalaxy(BITMAP* galaxy, int seed = 0, float factor = 40.0f, int tile_size = 128,
166 float persistence = .9, int octaves = 10)
167{
168 BITMAP* ret = create_bitmap_ex(8, galaxy->w, galaxy->h);
169 
170 psrand(seed);
171
172 for(int y = 0; y < ret->h; y++)
173 {
174 for(int x = 0; x < ret->w; x++)
175 {
176 float X = float(x) / tile_size;
177 float Y = float(y) / tile_size;
178 
179 g_nSeed = 0;
180 int dispX = factor * PerlinNoise_2D(X, Y, persistence, octaves) + x;
181 g_nSeed = 1;
182 int dispY = factor * PerlinNoise_2D(X, Y, persistence, octaves) + y;
183 
184 if(dispX < 0)
185 dispX = 0;
186 if(dispY < 0)
187 dispY = 0;
188 
189 if(dispX >= ret->w)
190 dispX = ret->w - 1;
191 if(dispY >= ret->h)
192 dispY = ret->h - 1;
193 
194 ret->line[y][x] = galaxy->line[dispY][dispX];
195 }
196 }
197 destroy_bitmap(galaxy);
198 return ret;
199}

Sirocco said:

40 seconds is too long to wait for anything when you're a gamer

This shall be a turn based game, and I shall implement a minimum time interval that a turn has to take, as to combat that rushing mentality. ;)

"For in much wisdom is much grief: and he that increases knowledge increases sorrow."-Ecclesiastes 1:18
[SiegeLord's Abode][Codes]:[DAllegro5]:[RustAllegro]

gillius
Member #119
April 2000

Is this a network game? If so you could have the computers all generate the map in parallel and break up the work ;).

Gillius
Gillius's Programming -- https://gillius.org/

Go to: