Allegro.cc - Online Community

Allegro.cc Forums » Programming Questions » lookup tables

This thread is locked; no one can reply to it. rss feed Print
lookup tables
Ariesnl
Member #2,902
November 2002
avatar

I was wondering,
are lookup tables for Sin, cos, tan still worth the effort ?
or is it useless on a modern computer ?

Perhaps one day we will find that the human factor is more complicated than space and time (Jean luc Picard)
Current project: [Star Trek Project ] Join if you want ;-)

Arthur Kalliokoski
Second in Command
February 2005
avatar

I'd say if whatever you're doing isn't terribly memory intensive that the lookup tables would be faster as long as they can stay cached.

I haven't fiddled with how it's done nowadays with 64 bit OS'es that use the SSE registers in lieu of the '87 chip, but the '87 stuff only took about 35 cycles or so to grind out the trancendental stuff, the math library stuff that insisted on the last bit of accuracy was considerably slower. With GCC you could specify to use the cpu instead of the math library with the --ffast-math option.

What are you trying to do anyway? I can't imagine using so many trig functions that it'd be a bottleneck.

They all watch too much MSNBC... they get ideas.

Ariesnl
Member #2,902
November 2002
avatar

Just wondering if it is still worth the efford. I once followed a raycasting tutorial and it used degree based sin, cos and tan using a lookup table and linear interpolation to speed things up.
It was in Turbo Pascal.. so quite old ..

Perhaps one day we will find that the human factor is more complicated than space and time (Jean luc Picard)
Current project: [Star Trek Project ] Join if you want ;-)

Chris Katko
Member #1,881
January 2002
avatar

Almost never.

We're currently sitting at 150-200 instructions per memory access.

So unless your data is hot in the cache, it's pointless. How much can you do in 200 instructions?

That being said, I'm sure someone somewhere has already benchmarked whatever use case you're looking at (like sin/cos/etc). So either google it, or simply write your own test case instead of guessing.

[edit] Here's one example:

https://jfdube.wordpress.com/2011/12/06/trigonometric-look-up-tables-revisited/

Quote:

According to this document, x86 instructions fsin and fcos takes between 65-100 cycles to execute, so we just need to write some code to compute the sine function using less than that.

But if you're investing time into something like this... 99% of the time you're probably wasting your energy. I wouldn't bother unless I was working on a tight-loop that was clearly a bottleneck in my game.

-----sig:
“Programs should be written for people to read, and only incidentally for machines to execute.” - Structure and Interpretation of Computer Programs
"Political Correctness is fascism disguised as manners" --George Carlin

Arthur Kalliokoski
Second in Command
February 2005
avatar

Ariesnl said:

raycasting

So you're needing <horizontal resolution> results <refresh rate> times a second? e.g. 1440 x 75 at the high end = 108,000 calculations per second. Don't worry about it.

[EDIT]

Are you referring to ray casting like Wolfenstein 3D or ray tracing like POV-RAY?

They all watch too much MSNBC... they get ideas.

Bob
Free Market Evangelist
September 2000
avatar

(I'm going to assume you first profiled your program, and that calls to sin/cos/tan have been spotted as a problem. Furthermore, I will assume that there are no algorithm changes in your application that would avoid these, etc).

How much precision in the output do you actually require? Or stated another way, how much error can you tolerate in the answer, and for what range of inputs?

That question is by far the most important one for determining which algorithms are appropriate. Is it 10^-4? 10^-16? Only for inputs between -pi and pi? Or for all possible input values?

The compilers' implementations are really fast, given that the output will be exact[1] for all possible inputs. However, if you're willing to tolerate inexact results, there are various algorithms available.

Most implementations work in the following way:
Step 1) Take an educated guess at the answer.
Step 2) Apply some math to refine the answer.
Step 3) Apply more (possibly different) math to further refine the answer.
Steps 4-1000) Same as above. Have as many steps as needed to achieve desired error.

So a good starting point may be to start from an existing implementation (for example, GCC's), then remove as later approximation steps as you're comfortable with.

Usually, the last step is both very expensive, and only serves to give you 1 or 2 more bits of accuracy.

Hope this helps.

[1] 'exact' is defined as calculating the result inside of the library as if there was infinite precision, and then rounding it to the return type (e.g. float).

--
- Bob
[ -- All my signature links are 404 -- ]

Chris Katko
Member #1,881
January 2002
avatar

That sounds a lot like the strategy behind Quake 3's magic inverse sqrt function.

https://en.wikipedia.org/wiki/Fast_inverse_square_root

And I agree 100%. If you really want to "optimize the !@$!" out of something, you need to start looking at it from a numeric analysis standpoint. What are the input and output ranges, and what is the acceptable error deviation from correct?

Books can be had on that topic for really cheap used.

https://en.wikipedia.org/wiki/Numerical_analysis

I bought this one years ago:

https://www.amazon.com/Numerical-Methods-Scientists-Engineers-Mathematics/dp/0486652416

There are a hundred ways to skin a cat. And many combinations of techniques... in different ORDERS (let alone combinations) will lead to a variety of results.

But the point is, instead of a simple hardcoded lookup, a modern implementation will involve a lookup, some approximation function, and maybe some more approximation function. (See the Quake 3 function link.) That way, you're benefiting from the CPU time and not just accessing a value from RAM and waiting for the next one if it's not in cache. You really want to pack things into units of single cachelines whenever possible (64-bytes on almost all systems these days).

[edit]

Another option is to take advantage of cachelines in a different way. Let's say you really just need X multiplied by a value from your lookup. Well, instead of multiplying ONE X at a time, why not multiply a whole ROW of X's? That's the principle behind Data-oriented design and practically REQUIRED to do any kind of PS3 programming. You pack like-minded data together, and then operate on many at the same time using the same function. So if you were using a blind lookup table, that lookup table would remain hot in cache the entire time you're using it--instead of doing ONE lookup and then running the next algorithm then the next, and then repeating for the NEXT OBJECT. This is where traditional OOP education becomes "slow as dirt". Treating each object as an independent unit by-definition means the compiler can't do anything to help you.

//to clarify, you go from this:
obj1.get_stuff();         //caches get_stuff() data/code
obj1.think_about_stuff(); //get_stuff no longer needed.
obj1.apply_stuff();       //think_about_stuff no longer needed.

obj2.get_stuff();         //apply_stuff no longer needed, start over.
obj2.think_about_stuff();
obj2.apply_stuff();

//to this:

get_stuff(obj1...objN); //all get_stuff routines are hot in cache after the FIRST object is called
think_about_stuff(obj1...objN); //ditto
apply_stuff(obj1...objN); //ditto

Or for a full lecture on how DoD rules by a industry great, game programmer, Mike Acton:

video

p.s. I'm so happy that Bob posted here.

-----sig:
“Programs should be written for people to read, and only incidentally for machines to execute.” - Structure and Interpretation of Computer Programs
"Political Correctness is fascism disguised as manners" --George Carlin

bamccaig
Member #7,536
July 2006
avatar

Every time Bob posts I feel like God himself has entered the forums to spread universal wisdom. And then he's gone like a ghost.

Audric
Member #907
January 2001

Note that the simple act of having a lookup table in memory slows down other pieces of
your code.
Two pieces of unrelated data (other variables) stored before and after the table may no longer be on the same memory page when a piece of code requires both. You obtain more page faults.

Neil Roy
Member #2,229
April 2002
avatar

Should write some code to perform benchmarks. I would be curious to see real results.

I may do this myself as I have a new project I am going to work on, a resurrection of an old one that used look up tables and I am curious if they are still viable. The link Chris shared would seem to strongly indicate they are, if done right. I know in my own project I could tolerate a high error without problem.

---
“I love you too.” - last words of Wanda Roy

Go to: