2021-02-10 UPDATE: I don’t fully understand this article, there are some omissions. I rewrote my thoughts and wrote a more complete explanation of floating point mechanics, portal: 0.1 plus 0.2 does not equal 0.3? A deep dive into JavaScript floating-point storage from a computer perspective


Writing in the front

I met a question in the interview before: why is 0.1+0.2 in JavaScript? = = 0.3. At that time I answered the floating point accuracy error, obviously the interviewer is not satisfied, doesn’t everyone know?

Xiaohongshu also stressed not to make a judgment of 0.1+0.2 === 0.3, but only mentioned the following reasons:

One thing to be clear about the rounding error associated with floating-point calculations is that ECMAScript is not unique to floating point calculations based on IEEE754 values; Other languages that use the same numeric format have this problem.

It’s only mentioned above that ECMAScript’s use of IEEE754 floating-point storage causes errors, but why dive into IEEE754, or even into computers? This article will explore this problem from the perspective of computer composition.

What is a floating point number?

The representation of numbers in a computer

The representation of numbers in computers includes fixed point numbers and floating point numbers.

  • Fixed point number: the position of the decimal point is determined. Assuming 16-bit storage, the decimal point after the sign bit becomes a decimal point machine (such machines can only represent decimals), and the decimal point at the end becomes an integer point machine (such machines can only represent integers).
  • Floating-point numbers represent scientific notation similar to decimal notation. It consists of order (order code, order character), mantissa (tail code, tail character). The mantissa number determines the precision of the floating point number, and the order number determines the range of the floating point number.

Normalization of floating point numbers

  1. For decimals, conversion to binary is a round by two operation, such as converting 0.1 to binary

    The operation process integer The decimal part
    0.1 * 2 = 0.2 0 0.2
    0.2 * 2 = 0.4 0 0.4
    0.4 * 2 = 0.8 0 0.8
    0.8 * 4 = 1.6 1 0.6
    0.6 * 2 = 1.2 1 0.2
    . . .
  2. Based on the IEEE74 standard (see section below) for double-precision floating point numbers (mantras up to 52 bits), we can get the following result:


0.1 = 0.0001 1001100110011001100110011001100110011001100110011001 ( 52 ) 0.1 = 0.0001 \ quad1001100110011001100110011001100110011001100110011001 (52)
  1. Normalized processing
  • Normalized definition (r represents the base value of order, here 2, S represents the mantissa)


    0.1 = 0.0001 1001100110011001100110011001100110011001100110011001 ( 52 ) 0.1 = 0.0001 \ quad1001100110011001100110011001100110011001100110011001 (52)
  • For complement of negative form, normalization definition does not apply (negative complement represents -1 below, does not meet normalization definition)

    S>0 Normalized form S<0 Normalized form
    The true value 0.1 XX… X The true value 0.1 XX… X
    The original code 0.1 XX… X The original code 1.1 XX… X
    complement 0.1 XX… X complement 1.0 XX… X
    Radix-minus-one complement 0.1 XX… X Radix-minus-one complement 1.0 XX… X
  • So generally speaking

    For the original code, whether integer, negative, the first digit 1 is normalized

    For complement, the sign bit differs from the first bit and is normalized

    In computers we usually use xor circuits to indicate that normalization is complete when the sign bit and the first bit are different

IEEE754 standard

  • Digital storage format :S (number character) + rank code (including rank character) + mantissa
  • Mantissa is normalized representation
  • The most significant bit that is not “0” is “1” (implied)
The sign bit S exponent mantissa The total number
Short real numbers (single precision) 1 8 23 32
Long real number (double precision) 1 11 52 64
The temporary number 1 15 64 80

How are 0.1 and 0.2 stored in a computer

Above we solved for the binary representation of 0.1, and likewise for 0.2 (52 means 52 mantissa digits).


0.1 = 0.0001 1001100110011001100110011001100110011001100110011001 ( 52 ) 0.2 = 0.001 1001100110011001100110011001100110011001100110011001 ( 52 ) 0.1 = 0.0001 \ \ begin} {aligned quad & 1001100110011001100110011001100110011001100110011001 (52) \ \ = 0.2 \ \ 0.001 quad & 1001100110011001100110011001100110011001100110011001 (52) end} {aligned

In the computer, floating point order code (P) is generally represented by shift code, mantissa (S) is represented by complement code, and a 1 before the decimal point is implied. The order codes and mantissa of 0.1 and 0.2 are obtained as follows


S ( 0.1 ) = 1.10011001100… 1 ( 1100 x 12 ) S ( 0.2 ) = 1.100110011… 001 ( 0011 x 12 ) P ( 0.1 ) = 1 . 0000000100 ( 4 ) P ( 0.2 ) = 1 . 0000000011 ( 3 ) \begin{aligned} S (0.1) = 1.10011001100… 1 (1100\times12) \\ S (0.2) = 1.10010011… 001 (0011\times12) \\ P (0.1) = 1,0000000100 (-4) \\ P (0.2) = 1,0000000011 (-3) \end{aligned}

The order is represented by shifting codes and stored in a computer (the brackets are the implied 1)


0.1 0 : 01111111100 : 1001100110011001100110011001100110011001100110011010 0.2 0 : 01111111101 : 1001100110011001100110011001100110011001100110011010 0.1 = 2 4 x [ 1 ] . 1001100110011001100110011001100110011001100110011010 0.2 = 2 3 x [ 1 ] . 1001100110011001100110011001100110011001100110011010 0.1 \ \ begin} {aligned & Rightarrow 0:01, 111111100:1001100110011001100110011001100110011001100110011010 \ \ 0.2 \ & Rightarrow 0:01 \ \ \ \ 111111101-1001100110011001100110011001100110011001100110011010 & 0.1 = 2 ^ {4} \ times [1]. 1001100110011001100110011001100110011001100110011010 \ \ & 0.2 = 2 ^ {3} \ times [1].1001100110011001100110011001100110011001100110011010 \end{aligned}

How does 0.1+0.2 work on a computer

First of all, we need to harmonize the small steps with the big ones (the mantissa of the small steps is reduced, so we only need to move it to the right, losing accuracy without making mistakes), where the order difference is 1

Note that the minor 0.1 right shift is supplemented by an implied “1”, not the default right shift of 0


0.1 = 2 3 x 0.1100110011001100110011001100110011001100110011001101 ( 0 ) 0.2 = 2 3 x 1.1001100110011001100110011001100110011001100110011010 s u m = 2 3 x 1 0.0110011001100110011001100110011001100110011001100111 \ begin} {aligned 0.1 = 2 ^ {3} \ times & (0) \ \ 0.2 = 0.1100110011001100110011001100110011001100110011001101 2 ^ {3} \ times & \ \ sum = 1.1001100110011001100110011001100110011001100110011010 2 ^ {3} \ \ \ \ times1 & 0.0110011001100110011001100110011001100110011001100111 end} {aligned

IEEE754 standard floating point rounding model

The IEEE754 standard defines four models for rounding floating point numbers

Round to nearest-roundtiestoeven (Default) : Approaches the Nearest number whose least significant bit must be 0 or even

I’m Round toward 0

Round toward +∞

Round toward −∞: Approaches negative infinity

The first model solves 50% rounding, and there is a model called Round to nearest-tiesawayFromZero, which, as its name suggests, rounds away from zero

Here are 5 examples of rounding models. The first four models are used for floating point rounding under IEEE754, and the first is the default model:

Mode / Example Value + 11.5 + 12.5 – 11.5 – 12.5
To nearest, ties to even (default model) + 12.0 + 12.0 – 12.0 – 12.0
toward 0 + 11.0 + 12.0 – 11.0 – 12.0
Toward + up + 12.0 + 13.0 – 11.0 – 12.0
Toward – up + 11.0 + 12.0 – 12.0 – 13.0
to nearest, ties away from zero + 12.0 + 13.0 – 12.0 – 13.0

Let’s take a look at how sum is normalized and rounded using the above criteria.

Generation of error

First, sum is normalized with the implicit 1 as follows (sum is between a and B)


a = 2 2 x 1.0011001100110011001100110011001100110011001100110011 ( 0 ) s u m = 2 2 x 1.0011001100110011001100110011001100110011001100110011 ( 1 ) b = 2 2 x 1.0011001100110011001100110011001100110011001100110100 ( 0 ) \ begin} {aligned a = 2 ^ {2} \ times & \ \ sum (0) = 1.0011001100110011001100110011001100110011001100110011 2 ^ {2} \ times & 1.0011001100110011001100110011001100110011001100110011 (1) \ \ B = 2 ^ {2} \ times & 1.0011001100110011001100110011001100110011001100110100 (0) \ \ \ end} {aligned

Following the first rounding model above, where the least significant bit of A is 1 and the least significant bit of B is 0, sum will use B and then be stored in the computer. Finally, we compare 0.3 stored in the computer with sum:


0.1 + 0.2 = 2 2 x 1.0011001100110011001100110011001100110011001100110100 0.3 = 2 2 x 1.0011001100110011001100110011001100110011001100110011 0.1 + 0.2 0 : 01111111101 : 0011001100110011001100110011001100110011001100110 [ 100 ] 0.3 0 : 01111111101 : 0011001100110011001100110011001100110011001100110 [ 011 ] \ begin} {aligned 0.1 + 0.2 = 2 ^ {2} \ times & \ \ 0.3 = 1.0011001100110011001100110011001100110011001100110100 2 ^ {2} \ \ \ \ \ 0.1 times and 1.0011001100110011001100110011001100110011001100110011 + 0.2 \ Rightarrow 0:01 111111101 & : 0011001100110011001100110011001100110011001100110 [100] \ \ 0.3 \ Rightarrow 0:01111111101&:0011001100110011001100110011001100110011001100110[011] \end{aligned}

Final result difference between the 2-2 x 2-542-52 = 2 ^ {2} \ times2 ^ {- 52} = 2 ^ 54} {- 2-2 x 2-52-54 = 2!!!!!!

If we convert the above two numbers to the familiar tenth power:


0.1 + 0.2 = 0.300000000000000044408920985006… 0.3 = 0.299999999999999988897769753748… \ begin} {aligned 0.1 + 0.2 = 0.300000000000000044408920985006 &… \ \ = 0.3 & 0.299999999999999988897769753748… \end{aligned}

Console output:

var ll = 0.300000000000000044408920985006; // There are 15 zeros in the middle
var lll = 0.299999999999999988897769753748;
console.log (ll, LLL);/ / answer: 0.30000000000000004 0.3
Copy the code

conclusion

  • A basic data structure involving JS – floating point number problems, in-depth exploration up, so that their review of a wave of computer components of the principle of knowledge.

  • At the time of learning, the book said little about IEEE754 standard, and did not even know the rounding standard of IEEE754. In my previous notes, I thought that sum was rounded by 0 and 1. Until I write this article and uncover IEEE 754, I realize it’s not that simple.

reference

  1. [Is floating point math broken? — — a stack overflow (answer by Wai Ha Lee)] (stackoverflow.com/a/28679423)

  2. [Rounding floating – point Numbers – Wikipedia] (en.wikipedia.org/wiki/IEEE_7…

  3. [IEEE 754: Rounding Rules – Wikipedia] (en.wikipedia.org/wiki/IEEE_7…