Fast loops?
gnolam

What's the fastest way of doing a loop like this:

1#define MAX_DIST 30
2 
3void render_screen(void)
4{
5 int y,x;
6 int dist;
7 
8 for (y = 0;y < 480;y++)
9 {
10 //do some calculations
11 for (x = 0;x < 640;x++)
12 {
13 // do some more calculations
14 for (dist = 0;dist < MAX_DIST;dist++)
15 {
16 //do even more of them
17 }
18 }
19 }
20}

With a loop like the example, without doing anything besides just running the loop, I get around 10 FPS! (with optimization enabled (-O2) in gcc, around 30). This is far too slow to be useful... and this is just with MAX_DIST set to 30 (not very far).

The reason I'm asking is because I have just resurrected my old voxel project, and I have to raytrace each pixel of the screen each frame (That is, unless I magically come up with how to do wave surfing with 5 degrees of freedom instead of 4 overnight).

I have already planned to use lookup tables for all that nasty trigonometry, but I can't find a way to make the loops themselves run any faster... does anyone have any tricks to speed them up?

Gabhonga

try to make it a single loop, if possible, in example:

for(i;i<(640*480*MAX_DIST);i++) {...}

other ways of making the loop faster would depend very much on the actual work done inside...have you tried enabling you compiler's optimizer, btw? ;)

Bob

Consider that you're doing 30*640*480 itterations each time; that's 9216000 iterations. If you have a 4 cycle average loop overhead (which is typical), you'll end up with 36864000 cycles of wasted time.

You're better off trying to do less work. For example, don't update the whole screen, but only the parts that have changed. Don't loop 30 times per pixel if you can help it; see if you can find a way to do it in 1 or 2 iterations of that inner loop.

LEDominant

Sounds like you are trying to do a true 3D raytracing engine. Programs like 3D Studios and POV-Ray use that. :o. See the problem here.

IT CAN TAKE HOURS TO RENDER A FRAME

Try doing Height-map Voxel(Delta Force), Sector Voxel(Duke Nukem 3D) or BSP Voxel(Doom).

Plucky

Hmmm. For my own voxel engine project I get ~60-100+ fps (depending on the number steps with matching step size) on my Athlon 1.2GHz, GeForce2 computer. However I'm not quite as ambitious as you are with 640x480. What I'm doing:

  • 360x360 viewport, 16 bit color

  • bilinear filtering

  • 200-1000+ steps

</li>
However my main voxel loop is a bit different than yours:

/* essentially: */
//blah, blah
for (column=0; column<360; column++) {
   // blah, blah
   for (step = 0; step<MAXSTEP; step++) {
      // Get Terrain height
      while (ray < terrain height) {
         // check if we're still in viewport
         // putpixel
         // move ray up and other blahs
      }
      // move ray and other blah, blahs
   }
   // other blah, blahs
}

Perhaps your voxel algorithm is not particularly efficient.
I'm using the ray casted wave surfing technique; at each step I'm finding the appropriate y pixel locations to do putpixel. Therefore there's no need to check every y pixel in a column.

There's another technique where you trace the terrain 'across the screen' and store the whole 'line' in a buffer. Compare it to the previous line buffer (from the previous step) and putpixel the columns between the two calculated wavy rows in your viewport. Basically you shouldn't be drawing every row since most of the time the upper third to half of your viewport should be sky.

Other things: Do you really need 640x480 viewport? Fixed point works well. Bilinear filtering makes things look better but requires work to prevent it from bogging down the engine too much. After some distance away, step size could get larger since detail gets blurry at a distance anyways (I've experimented with this but haven't implemented it.) BTW I'm not raycasting clouds... it's just a simple moving perspective texture with some dithering to hide some of the blockiness. If the view haven't moved, there's no need to perform a voxel rendering loop... at those 'idle' times my fps at least doubles (200+). Try reducing the rate of user input so you have more idle frames where no voxel rendering is required.

Holler if you want more ideas.

spellcaster

The problem is that he wants a voxel engine and you're describing a heightmapper. Which is a similar concept, but still a different thing.

A true voxel engine really uses "3d pixels" == VOlumetric piXEL. So you could create tables, chairs, etc. in true 3d with it.

Plucky

Usually when someone talks about a voxel engine, they refer to a terrain mapping engine. It's still uncommon to use voxels for texturing / rendering a 3D object. Furthermore, he mentions using the wave-surfing technique, which is exactly what I'm doing.

And voxels for terrain is still voxels... it's still a 3d 'pixel' where the view is only from above.

gnolam

Plucky:
You are right, kind of... it is supposed to become a terrain renderer, so I'm only dealing with heightmaps. But I think what you're describing is a simple 4 DOF renderer (such as the one Alex Champanard describes in an excellent tutorial on Flipcode, while I'm after a 5-6 DOF (5 will be enough for my needs, roll isn't important) renderer.

I couldn't find a way to use wave surfing for more than 4 DOF, so I had to resort to raytracing... besides, I had done a true voxel renderer (like Spellcaster describes) a while back, which I unfortunately broke some time ago and never botherered to restore, and that used the same technique that I am using now. That one was too slow to use for more than rendering static images though... so this time around I was trying to make it work in realtime.
There is supposed to be a way to do wave surfing in 5 DOF (Appeal claims to have used it for Outcast, but I'll be darned if I can think of a way... maybe if I sat down and thought about it hard for a day or two.

Guess my options are:

  • Make it render only static images, and stop trying to use it for a game.

  • Put this project on ice (again) and do some more work on my main project (Nuclear War)

  • Lock myself in my room and try to use the brain that I'm supposed to have and come up with a decent wave-surfingesque algorithm (and then continue with this project).

Plucky

Here's my take:
The 6 deg of freedom are:
X, Y, Z, rotate left/right, rotate up/down, roll clock/counterclockwise.

As we all know, the standard 4 DOF voxel terrain renderer use the first 4. You're looking for the 5th one. For the wave surfing technique, you need to calculate the initial ray vector and an algorithm to move the ray up one pixel row. For a 4 DOF renderer, this can be precalculated. For a 5 DOF renderer, this needs to be calculated for each frame. I think this is the only real difference. In other words, 5 DOF voxel terrain renderer is done similarly to a 4 DOF one. (This is definitely true for limited up/down pitch... haven't confirmed that this would work all 360 deg. I'll reason it out aloud below.)

Let's try an example.
The delta-Z component of the ray starting at the bottom of the viewport, in my algorithm, is something like: drz0 = -tan(fov/2 - pitch /* pos pitch is upward */) * stepsize; and the change in the voxel scale component (which modifies the how much drz changes as it surfs): d_voxelscale = -2 * drz0 * viewsize_y;
Assume fov = 90. If we pitch down less than 45 deg, then the algorithm above has no problems where drz0 is negative and d_voxelscale is positive. If we pitch down 45 deg, we have a special case where drz0 = infinity and d_voxelscale has no meaning. If we pitch down more than 45 deg, then drz0 would still need to be negative, and d_voxelscale will still need to be positive, so the above equation needs to be modified because the sign needs to be switched. Furthermore drx and dry needs to also switch sign.

I think the key point is that the row of pixels to start the wave surfing would not be the bottom row of the viewport. This is only true if you cannot see the ground directly below you (the standard case with 4 DOF.) Otherwise, you need to calculate which two pixel rows that straddle the ground directly below. Then you need to wave surf two rays in opposite directions (as described in the previous paragraph) filling the pixel column up and down.

Hope this helps.

Mr Moose

Well for what your doing i call that extremely fast heh..

If you want that much freedom in your voxel engine then Yea its probably gonna be slowish, but you could consider making it more blocky, or blocky but bilinear'ed to make smooth, eg skip every 2 pixels (x and y) and you should get 4x the speed.
skip 3 and thats 9x speed, maybe 90fps.

Or you could be lazy and make it just a hightmapper instead heh soon as how there best for games i supose.

But anyways hope you find a solution

uruloki

you probably already know, but in such loops be sure to access your data in the order they are stored in memory, to avoid cache misses.
If you are good, you can unroll your loops and make said 4 iterations instead of 1, using SIMD or whatever.

Or let the hardware do it for you, by using opengl, setup your voxels using display list and don't access directly to the frame buffer.

Thomas Harte

With all due respect, if the question is only one of taking a heightfield input and producing a 3d output with 5/6DOF, and you have hardware 3d support of any description, then the ROAM algorithm is a much better choice than explicitly using voxels. I'm not the best person to explain, but I'll summarise it briefly.

The algorithm works in terms of a recursive structure, the basic building block of which is an isoceles triangle. If you split an isoceles triangle halfway along its longest side and join that point to the opposed point, you get two more isoceles triangles. With ROAM you typically do a coverage calculation that determines how inaccurately a given triangle models the heightfield region it covers based on factors including distance from the camera, although some slightly less mathematically accurate methods better tailored to quick computation are possible, and if the inaccuracy is above a certain threshold you divide and recurse. Obviously you pick the threshold and tweak it based on the polygon complexity your card is happy with and the frame rate you are getting.

There are many considerable optimisations based on considering coherences from one frame to the next, but basically that is it. Its a quite simple but very effective means of adaptive meshing and you've probably seen it in a couple of games if you've played any recently, because its quite popular.

Also : although this is not true for voxels, and therefore strictly irrelevant to the discussion, I wish to remind anyone who hasn't spotted it yet that using compiled vertex arrays is faster than display lists for most purposes

Thread #193089. Printed from Allegro.cc