What Every CS’tist Should Know About Floating-Point

An simple example for JavaScript

Open the Web Console (Ctrl+Shift+K for Firefox) and type:

var a = 0.2;
var b = 0.3;
a+a+a;
b+b+b;

Answers are 0.6000000000000001 and 0.8999999999999999

JS_floatingP

Almost every language has a floating-point datatype; computers from PCs to supercomputers have floating-point accelerators; most compilers will be called upon to compile floating-point algorithms from time to time; and virtually every operating system must respond to floating-point exceptions such as overflow.

There are two different IEEE standards for floating-point computation. IEEE 754 is a binary standard that requires = 2, p = 24 for single precision and p = 53 for double precision [IEEE 1987]. It also specifies the precise layout of bits in a single and double precision. IEEE 854 allows either = 2 or = 10 and unlike 754, does not specify how floating-point numbers are encoded into bits [Cody et al. 1984]. It does not require a particular value for p, but instead it specifies constraints on the allowable values of p for single and double precision.

The IEEE 754 Standard for Floating-Point Arithmetic (IEEE 754) is a technical standard for floating-point computation established in 1985 by the Institute of Electrical and Electronics Engineers (IEEE). Many hardware floating point units use the IEEE 754 standard. The current version, IEEE 754-2008 published in August 2008, includes nearly all of the original IEEE 754-1985 standard and the IEEE Standard for Radix-Independent Floating-Point Arithmetic (IEEE 854-1987). The international standard ISO/IEC/IEEE 60559:2011 (with identical content to IEEE 754) has been approved for adoption through JTC1/SC 25 under the ISO/IEEE PSDO Agreement[1] and published.[2]

The standard defines

arithmetic formats: sets of binary and decimal floating-point data, which consist of finite numbers (including signed zeros and subnormal numbers), infinities, and special “not a number” values (NaNs)
interchange formats: encodings (bit strings) that may be used to exchange floating-point data in an efficient and compact form
rounding rules: properties to be satisfied when rounding numbers during arithmetic and conversions
operations: arithmetic and other operations on arithmetic formats
exception handling: indications of exceptional conditions (such as division by zero, overflow, etc.)

The standard also includes extensive recommendations for advanced exception handling, additional operations (such as trigonometric functions), expression evaluation, and for achieving reproducible results.

IEEE Std 854-1987, the Standard for radix-independent floating-point arithmetic, was the first Institute of Electrical and Electronics Engineers (IEEE) standard for floating-point arithmetic with radix 2 or radix 10 (not more general than that, despite the title).

The standard was published in 1987,[1] nearly immediately superseded by IEEE 754-1985 but never terminated (the year of ratification appears after the dash). IEEE 854 did not specify any formats, whereas IEEE 754-1985 did. IEEE 754 specifies floating-point arithmetic for both radix 2 (binary) and radix 10 (decimal), including specifying two alternative formats for radix 10 floating-point values. IEEE 754-1985 was only superseded in 2008 by IEEE 754-2008.[2] IEEE 754-2008 also has many other updates to the IEEE floating point standardisation.

Oracle: Numerical Computation Guide;

Wikipedia: IEEE floating point, IEEE 854-1987.