AGAIN floats
Frank Drebin

i want to write a function

bool floats_are_equal(float a, float b)
{
   return (fabs(a-b)<float_error);
}

with float_error=0.000001 because floats have 6 decimals or is there a better value i can take for this (do have floats a maximum inaccuracy i should take?)?

Evert
Quote:

because floats have 6 decimals

Not exactly. Floats can store numbers with a certain precision (don't know if that's six or more, but that doesn't really matter), that precission does not depend on the order of magnitude of the number (because it is a floating-point number).

Depending on your needs, 0.000001 could be a sufficient check. It really depends on what you want to do with it.

spellcaster

First of all, you need to turn around the operator, or it'll return true if the floats are not equal.
And I'd say use something which makes sense in your game. If you're using 4 decimal digits (internally) use 0,00005 as error.

damage

If you are using C++, do this.

#include <limits>

using namespace std; 

...

cout << "Float has " << numeric_limits<float>.digits10 << " digits" << endl

Secondly, don't use float. Use double, you get more precision for free, I am pretty sure.

Evert

doubles may be slower than floats (usually not an issue on modern hardware).

gnolam
damage said:

Use double, you get more precision for free, I am pretty sure.

It's a trade-off between accuracy and speed.

Frank Drebin said:

with float_error=0.000001 because floats have 6 decimals or is there a better value i can take for this (do have floats a maximum inaccuracy i should take?)?

http://www.infosys.utas.edu.au/info/documentation/C/CStdLib.html#float.h
For further information, consult your local float.h ...

Chris Katko

To my knowledge, doubles will normally be slower then floats until we go 64bit processors (doubles are 64bit). Unless in the case of using MMX, but then you have to reset the MMX registers and that takes up most of the gain, unless your doing batches of floating-point numbers.

I may be wrong. I haven't been able to code in a long time so it may or may not be correct.

damage
Quote:

doubles may be slower than floats (usually not an issue on modern hardware).

That's true, but the vast majority of us will be programming on an x86 pentium or greater. From the pentium onward double and float have exactly the same speed, AFAICT.

A J

does the libc have #DEFINEs for floating point error for operations on floats ?

Frank Drebin

the question isn't wehter using floats or doubles.
i'm going to use floats. but what value should i take??? (p.s. is FLT_EPSILON in the float.h the maximum inaccurcay of floats?)?

Mars
Quote:

the question isn't wehter using floats or doubles.

Who are you blaming? They stayed relatively close to the topic! ;D

Quote:

i'm going to use floats. but what value should i take

You already have got some good replies. It depends on your application. If you're programming a game, then you need to take into consideration the accuracy of your engine.

X-G

I'm still trying to figure out why exactly you need to do this comparison. So far, I've never had to do == checks on floats or doubles, simply because I've never come across any situation that would require it.

Plucky

Try:

bool floats_are_equal(float a, float b)
{
   return (fabs(a-b)/fabs(a) <= FLT_EPSILON) &&
          (fabs(a-b)/fabs(b) <= FLT_EPSILON);
     // && can be replaced with || for a more relaxed comparison
}

Should solve problems with scale issues and over/underflow issues.

Frank Drebin

i use this because my coordinates of objects are stored in floats and because of movement and collision detection i have to compare them!!!

--> so there are max 3 places before the point and now when there are 3 places behind the point left -> i should take a value like 0.001 ?!?

X-G

Compare then, yes - but why == ? Collision detection usually just involves <= and >= ...

Frank Drebin

that's true but for movement ( while (x_pos!=x_pos_to_reach) move_x();) or something like this sometimes ==/!= is nice.

X-G

You could use use a threshold for that purpose and use </>.

Fladimir da Gorf

You can use, for example:

if( x_pos > x_pos_to_reach )
 while( x_pos < x_pos_to_reach ) move_x();
else while( x_pos > x_pos_to_reach ) move_x();

I've never, ever yet needed to compare floats, and I don't think I'll ever need to.

Frank Drebin

yes but i think mine is a bit shorter (easier).
i don't like floats very much too but you have to use them for smooth movement (except scaled ints)

Fladimir da Gorf

"i don't like floats very much too "

Who said I didn't like floats? When was the last time I didn't use floats (or doubles) for movement? I don't like comparasions between them, just because you may not reach the destination, depending on the speed of the object.

"yes but i think mine is a bit shorter (easier)."

Yes, but mine works.

Andrei Ellman
evert said:

Quote:
--------------------------------------------------------------------------------
because floats have 6 decimals
--------------------------------------------------------------------------------
Not exactly. Floats can store numbers with a certain precision (don't know if that's six or more, but that doesn't really matter), that precission does not depend on the order of magnitude of the number (because it is a floating-point number).

An IEEE-754 floating point number is represented by a sign-bit, an exponent, and a mantissa. The precision of a floating point number depends only on the number of bits in the mantissa-portion, and the range of it's order of mangitude depends on the number of bits in the exponent, so talking about the "number of decimals" does not make sense with floating point numbers (unless the exponent is always the same - hence 'fixed-point numbers'), but the "number of significant digits" does.

A floating point number is worked out from it's stored representation using

( 1 - 2*sign ) * 2^(exp - exp-bias) * ( 1 + mantissa / mantissa-divisor )
The exp-bias and mantissa-divisor are used to maximise the number of possible values that can be stored with this representation. The +1 for the mantissa and the exp-bias are used to normailse the number.

A single precision floating point number (a 32-bit 'float') uses 1 sign-bit, 8 exponent-bits and 23 mantissa-bits, so exp-bias would be 127 (2^8 - 1 - 2^(8-1)) and mantissa-divisor would be 8388608 (2^23).
A double precision floating point number (a 64-bit 'double') uses uses 1 sign-bit, 11 exponent-bits and 52 mantissa-bits, so exp-bias would be 1023 (2^11 - 1 - 2^(11-1)) and mantissa-divisor would be 4503599627370496 (2^52).

The number of significant digits in base 10 a floating point number can store is calculated as:
ceil(log10(2^(mantissa_bits+1)));
The +1 was added because the floating point format assumes the first bit of the mantissa is always 1 (an optimisation that can be done using base-2 numbers).

For more information and much better explanations than what I've written here, see the following web-pages:

</li>
Do some Googling for even more...

etwinox said:

To my knowledge, doubles will normally be slower then floats until we go 64bit processors (doubles are 64bit). Unless in the case of using MMX, but then you have to reset the MMX registers and that takes up most of the gain, unless your doing batches of floating-point numbers.

The MMX is used to do similar operations in paralell on sets of data, but MMX only works for integers (or fixed point numbers with tweaking of the input/output). MMX does not work with floats/doubles.

[edit2:] SSE which is a newer extension than MMX is like MMX but works with floats. SSE2 (only found so far on Pentium 4's) is similar but uses doubles.

damage said:

Quote:
--------------------------------------------------------------------------------
doubles may be slower than floats (usually not an issue on modern hardware).
--------------------------------------------------------------------------------
That's true, but the vast majority of us will be programming on an x86 pentium or greater. From the pentium onward double and float have exactly the same speed, AFAICT.

I think that internally, the Pentium will perform operations at the same speed on doubles and floats as long as the values are stored within the CPU's FPU registers (not sure on this). What I am sure of is that on a 32-bit processor (IA32) such as the Pentium and many of it's successors, there is only a 32-bit bus between the CPU and memory, so the CPU can read/write a 32-bit float (32-bits) in one memory-fetch, and it takes two memory-fetches to read/write a 64-bit double to memory, and three to read an 80-bit number (question: is this what a 'long double' is or does it have even more bits?). On an IA64 processor, it can work with doubles and floats at the same speed (unless there's some sort of optimisation that involves reading in two adjacent floats in memory at once).

Also, the x87 FPU (floating point component of any x86-based chip) uses 32-bits for short floating-point numbers, and 80-bits for long floating point numbers. I don't think it can natively work with 64-bit numbers without expanding them to 80 bits. This means that the x87 FPU will not generate overflows or underflows that would be generated by doing the same calculations using a 64-bit FPU.

Plucky said:

Try:
bool floats_are_equal(float a, float b)
{
return (fabs(a-b)/fabs(a) <= FLT_EPSILON) &&
(fabs(a-b)/fabs(b) <= FLT_EPSILON);
// && can be replaced with || for a more relaxed comparison
}
Should solve problems with scale issues and over/underflow issues.

The above code uses divides. That will significantly slow down any code that does a lot of comparisons. IMO, that's just overkill!

[edit1:] perhaps an optimised version of this could be written that works by adding to or subtracting from the bits of the IEEE-754 float's exponent instead of using divides, but that would make it non-portable.

AE.

Plucky
Quote:

The above code uses divides. That will significantly slow down any code that does a lot of comparisons. IMO, that's just overkill!

The code, or something like it, is necessary if you want to compare floats across its whole dynamic range. Otherwise something like what Frank first suggested would work if you knew what range all the float comparisons would be operate in.

[edit] Perhaps we should add floating point questions to the list of questions that pop up weekly or bi-weekly. :) I bet we already have a dozen posts that go through the details of the floating point implementation.

Andrei Ellman

Plucky: See the edit1 to the above post for a possible means of optimising it.

Plucky

I'm not sure playing with the bits and probably adding one or two conditionals to handle special cases would be faster. Someone would of course need to code both methods and test them to be sure.

Korval
Quote:

that's true but for movement (
while (x_pos!=x_pos_to_reach) move_x();
) or something like this sometimes ==/!= is nice.

You aren't looking for a generalized ==/!= floating-point test. What you want is to test whether or not the given Vec2 is within a particular box. The size of the box depends on the accuracy you care about.

If this is a 2D game, the accuracy you care about is to the nearest pixel. So, just do an "(int)" cast operation to the floats in question.

If this is a 3D game (or, for some reason, you need better-than-per-pixel-accuracy), then I would suggest a more flexible system than, "Is he at position Vec2 yet?" I would, instead, suggest that all movements be normalized on the range 0.0f-1.0f.

Let's say you have the initial position Vec2i, and the final position Vec2f. The direction towards the final position is Normalize(Vec2f-Vec2i). Let's call that Vec2Dir. The distance between these two points is Dist. Given that this is a game, you probably have some speed that the object moves. Let us call that Speed.

The objective now is to create a function F(t), such that F(0.0f) == Vec2i, and F(1.0f) == Vec2f. Why is this good? Because it makes telling when you're done with the movement trivial. You just clamp 't' such that it is never greater than 1.0f. When t==1.0f (a perfectly fine comparison, considering that your clamp function just set it to 1.0f), the movement is finished.

So, what is F(t)? Well, F(t) = Vec2i + (Vec2Dir * Dist * t). Simple enough, right?

The Vec2Dir gets it pointing in the right direction. Dist * Vec2Dir + Vec2i == Vec2f (given the definitions of Dist and Vec2Dir). And t=0 means that F is Vec2i. Just what we need.

Of course, it isn't precisely what you need, because the time scale of t is all wrong. You have some speed for the object. And you're taking particular timesteps. Therefore, you need to be able to feed in a time delta into F(t). Which means, you need a new function t(time) such that you can build the useful function G(time).

Well, given the speed Speed of the object, you know that it will take TimeScale = Dist / Speed seconds to go from the initial position to the final. As such, the function t(time) = time / TimeScale.

Therefore:

G(time) = Vec2i + (Vec2Dir * Dist * (time / TimeScale));

What if the speed changes during the travels? Well, you will have to reset the variables every time the speed changes. Vec2i becomes the current location, and you have to recompute Dist and TimeScale (Vec2Dir shouldn't have changed, but you get it for free).

Obviously, if the object isn't going in a straight line from point A to point B, that's a different issue. In that case, use the box method, and pick whatever accuracy you need (tenths, 0.001, etc). Do not, however, think of it as a generalized floating-point equality test. This is to be used only for the purpose of determining whether a particular Vec2 is at a particular position in the world. This function should be used only for that.

damage

What I am sure of is that on a 32-bit processor (IA32) such as the Pentium and many of it's successors, there is only a 32-bit bus between the CPU and memory, so the CPU can read/write a 32-bit float (32-bits) in one memory-fetch, and it takes two memory-fetches to read/write a 64-bit double to memory, and three to read an 80-bit number.

That's true; I was wrong. I should have qualified my original statement, I suppose.

But I guess with a lot FP code the actual operations take longer than the memory accesses... I'm just confused because I have tried a lot of my code with both doubles and floats and not noticed any significant differences.

(question: is this what a 'long double' is or does it have even more bits?).

I don't think so; I thought that x86 supported 3 types of float types: single precision (32bit), double precision (64bit) and double extended precision (80bit, and which is the same as the so-called "long double").

Korval
Quote:

What I am sure of is that on a 32-bit processor (IA32) such as the Pentium and many of it's successors

As an aside, IA32 doesn't exist. Intel has chips that use the IA64 instruction set, but this is not based on a pre-existing IA32 instruction set. If you want to refer to the instruciton set of 32-bit Pentiums, the correct terminology would be x86-32.

Thomas Fjellstrom

Really? I hear alot of people call the "x86-32" platform IA32. All it stands for is "IntelArchetecture32"... Now how can that be wrong?

Bob
Quote:

As an aside, IA32 doesn't exist.

Make sure you tell Intel about this ;D ;D ;D

Quote:

What I am sure of is that on a 32-bit processor (IA32) such as the Pentium and many of it's successors, there is only a 32-bit bus between the CPU and memory, so the CPU can read/write a 32-bit float (32-bits) in one memory-fetch, and it takes two memory-fetches to read/write a 64-bit double to memory, and three to read an 80-bit number.

That's a very very simplistic view of what's going on. On the Pentium Pro and up, the memory bus is 64 bits wide(*). However, the CPU's MMU doesn't talk to memory directly, but to cache first. Cache is read in lines of 32, 64 or 128 bytes (Pentium to P4).

What happens is that when you use doubles instead of floats, your memory usage (roughly) doubles. This means the CPU can store less variables in cache, which will force it to spill more of it onto main RAM, which is slower.

(*) multiple interleaved RDRAM channels are combined to form a 64-bit bus, but RDRAM itself is only 8 or 16-bits wide.

Andrei Ellman
damage said:

(question: is this what a 'long double' is or does it have even more bits?).

I don't think so; I thought that x86 supported 3 types of float types: single precision (32bit), double precision (64bit) and double extended precision (80bit, and which is the same as the so-called "long double").

I don't think the x87 (the part of the x86 that does FPU ops which was a separete chip on 386 and 486SX based PCs) can natively support 64-bit precision. It only does 32 and 80 bit precision (this is true on the 387, 487 and early Pentiums. I'm not sure about the modern Pentiums, Athlons, etc.). Whereas C only supports floats and doubles (I'm not sure how well 'long doubles' are supported in standard C and if they only appear as language-extensions). This means that there's a possibility that operations that could cause overflow or underflow in 64-bits work fine in 80-bits. So if code that uses doubles is ported from the x87 to a platform that can support 64-bit maths as easy (or better) than 80-bit maths, it will produce slightly different results, or in the worse case, underflow or overflow when the x87 version won't. Is there something in the ANSI C standard that says calculations involving doubles always have to be done with 64-bit precision? And if the x87 only natively supports 80-bits, how would the compiler write code to comply to the standard? And do the compilers comply to the standard and sacrifice efficiency as a result or do they cheat and do the maths in 80-bit mode and return the result truncated to 64-bits?

float tip of the day:
If perserving accurcy is really important, it is best to write "c*a + c*b" instead of c*(a+b). This is beacuse additions and subtractions are prone to losing accuracy if the exponents of a and b are different (the lower order bits on the lowest value become insignificant when converting the exponent to the exponent of the higher value, so it's best to save additions and subtractions to be done as late as possible in the calculation). Of course this is a lot slower as it uses an extra multiply, but as usual, it depends on whether you want speed or acuracy.

AE.

Frank Drebin

one closing question:
0 can be represented exactly by a float !!?
(so if (float_var==0) works)

gnolam

Yes, 0 can be represented exactly (in more than one way, even). But if you had read the previous threads, you would have noticed that it isn't so much the fact that some numbers can't be represented exactly that should make you be wary of what you're doing, it's that you will have problems reaching that exact value. So the only time I would use "if (float_var == 0)" was if I had set float_var to 0 myself.

Evert

To be more precise, there is -0.0f and +0.0f, which behave identically under computations.
But as a rule of thumb, you should never use == when dealing with floats. Say x and y tend to the same value a. At some point, x = a-e1 and y = a-e2. In this case, x and y will be close, but you can't count on e1==e2 being true, even if they should be without numerical calculations.
But that's just what everyone has been saying worded differently. :)

Frank Drebin

o.k

Korval
Quote:

Make sure you tell Intel about this

Have I mentioned before that I hate Intel? Can't these people stop making up names for things? The whole point of calling their 64-bit architecture IA64 is so that it is immediately differentiated from x86 chips. Calling an x86 chip an IA32 chip violates this, rather important, distinction.

Thomas Fjellstrom

The IA32 name has been around for years ::)

Zaphos
A.E. said:

float tip of the day:
If perserving accurcy is really important, it is best to write "c*a + c*b" instead of c*(a+b).

Will an optimizing compiler change "a*b+a*c" to "a*(b+c)" automatically? It seems like such a switch would be logical to make ???

Andrei Ellman
Zaphos said:

Will an optimizing compiler change "a*b+a*c" to "a*(b+c)" automatically? It seems like such a switch would be logical to make

Good point. I picked up the tip from an e-book about x86 machine-code. An assembler doesn't normally do optimisations so there would be no risk of it being optimised. As for writing the code in C, the compiler has no means of taking a hint that accuracy is more important than speed. In GCC, you can switch off individual optimisations. It might be possible to switch off the optimisations that re-arrange equasions for speed, but that switch might also prevent other optimisations from being done so you will have to order your mathematical equasions manually in the form that gives the most speed (or the most accuracy depending on what you want).

Korval said:

Have I mentioned before that I hate Intel? Can't these people stop making up names for things?

They've had a long history of being awkward with their naming conventions.

The Intel 80386 came in two flavours, the SX and DX. The same for the 80486 but SX and DX mean different things. I'm not quite sure what it means in 80386's (might have something to do with a 16-bit memory-bus on the SX's), but on an 80486, the DX has a built in maths-co-processor (FPU), and the SX means that a separate FPU (the 80487) must be installed separately. The '80' bit was dropped because people just pronounced the last 3 digits, so they were re-named the i386 and i486. Somehow, the i586 wasn't good enough for the Pentium, so the name "Pentium" was used ('pent' meaning five). This meant that the next generation of CPUs would have to be called the "Hexium" or even worse, the "Sexium". Red-faced Intel marketing staff then came up with the name "Pentium Pro" for the i686 line of CPUs. The problem with the i686 was that although it improved rapidly on the execution of 32-bit instructions, it was actually slower at executing 16-bit instructions. Intel had made a boo-boo and had assumed most people would be running mostly 32-bit code by then (1996), but Windows95 - although a 32-bit OS - still had huge chunks of 16-bit code in it, so it actually ran slower on a Pentium Pro than a Pentium. A year later, they created the Pentium MMX. As Intel were worried about Windows 95 and it's 16-bit code, they based the Pentium MMX on the i586 instead of the i686 (tho' I'm not really sure on this). Finally in 1998, the Pentium Pro and MMX branches were merged and the Pentium II came into being. Either they had sped up the 16-bit instruction execution, or they realised that Windows 98 which was about to be released had a greater prpoprtion of 32-bit code than Windows 95. By now, the ix86 naming convention had been abandoned, and they just kept adding Numbers to the end of their pentiums, and each new pentium had an additional name (eg: PIII = "Katmai"). Since the Pentium II, Intel have been releasing cheaper versions of their CPUs with smaller caches known as the Celerons.

Rival chip-maker AMD are also guilty of producing misleading names. IIRC They re-named their early Athlons to Durons and released a new faster batch of Athlons.

My god, I'd hate to see what these guys call their kids.

AE.

Evert
Quote:

(might have something to do with a 16-bit memory-bus on the SX's)

IIRC, that's exactly the difference between a 386SX and a DX, yes.
Interestingly enough, the 8086 an d 8088 (which we probably should call 86 and 88 if we call the 80386 the 386, but whatever) have the same sort of difference: the 8086 was a 16 bit processor with a 16 bit bus. The 8088 was a 16 bit processor with a 8 bit bus. The difference between the 80186 and 80188 (which haven't been around long) is similar.

Bob
Quote:

As for writing the code in C, the compiler has no means of taking a hint that accuracy is more important than speed.

GCC has -ffast-math and -fieee-fp (or something like that). -ffast-math implies -fno-ieee-fp.

Quote:

it was actually slower at executing 16-bit instructions.

Not really. The Pentium Pro had two Achiles heels:
- Mixing operations of different sizes on the same registers would cause partial register stalls of 6 cycles (stall penalty reduced on the P2, but I'm not sure to how much).
- Prefixes would cost one additional cycle to process (most prefixes became 'free' in the P2)

There's nothing that's 16-bit specific; it's just that the code people wrote (or that compilers outputted) wasn't adapted to the CPU (or vice-versa, depending on your point-of-view :) )

Quote:

The IA32 name has been around for years

Yeah, around the same time Itanic was announced, back in 1995 or so.

Thomas Fjellstrom

:o I could swear.. Oh well :D Hmm... What to by an Itanic, or a Quad Opteron with 6GB of memory drool

Mars

BTW I just found the GCC warning option that always warns if you try to use == with floats. :)

Andrei Ellman
Mars said:

BTW I just found the GCC warning option that always warns if you try to use == with floats.

It's -Wfloat-equal

Thread #268363. Printed from Allegro.cc