preface

Floating point operations are often encountered in my work, so for some common “bugs” such as floating point accuracy loss, 0.1+0.2! The static attribute MAX_SAFE_INTEGER of Number is known, but not why the scope is defined this way. I happened to have a thorough understanding of these doubts in my spare time recently. I found that there are also some articles on the Internet, which have sorted out these knowledge, but they are either too obscure and need some basic knowledge to understand, or they are too scattered and do not have a comprehensive analysis. Therefore, I want to write an article in this respect, which is a summary and inspection of my own learning results. On the other hand, I want to share it with other students who are confused like me in an easy to understand way. We can learn from each other and make progress together.

This article will first introduce some concepts, then take an in-depth look at IEEE floating point precision loss, and finally explain why the maximum safe numberMAX_SAFE_INTEGERThe value is.

Floating point Numbers

To start with floating points, all numbers in JavaScript, whether integers or decimals, have only one type Number. In accordance with IEEE 754 standards, the Number type is essentially a 64-bit fixed length floating point Number, namely the standard double precision floating point Number.

The IEEE floating-point format uses scientific notation to represent real numbers. Scientific notation represents the numbers asMantissa (mantissa), and ** exponent **. For example, 25.92 can be expressed as, where 2.592 is mantissa and valueIs the index. ** The base of the exponent is 10, and the exponent indicates how many places the decimal point moves to produce the mantissa. Every time the decimal point moves forward, the exponent increases; Every time the decimal point moves back, the exponent decreases. * andCan be represented as. Scientific notation means the same thing in binary.

Computer systems use binary floating-point numbers, which represent values in the format of binary scientific notation. The numbers are represented in binary format, somantissaandindexAre based on binary, not decimal, for example. In binary notation, moving 1.0101 two places to the left yields the binary value 101.01, which represents the decimal integer 5, plus the decimalTo generate the decimal value 5.25.

The composition of floating point numbers

IEEE floating point uses scientific notation to represent real numbers. The IEEE Floating point standard divides a binary string into three parts, which are used to store the mantissa, order and sign bits of floating point numbers. Among them

  • Sign bit S: The first bit is the sign bit (sign). 0 represents a positive number and 1 represents a negative number
  • Exponent bit E: The middle 11 bits store the exponent, which is used to represent the exponent
  • Mantissa: The last 52 bits are mantissa. The excess bits are automatically rounded to zero. The default binary integer is 1

The exponent represents the exponent part of a floating point number. It is an unsigned integer with an 11-bit length. The value ranges from 0 to 2047. Because the index value can be positive or negative, it needs to be offset with a deviation value, that is, the ** true value of the index = the integer – deviation value of the exponent part. For 64-bit floating-point numbers, the deviation is 1023, where [0,1022] is negative and [1024,2047] is positive **.

To represent the value of a floating point number, calculate it by formula as follows:


The formula may seem a bit abstract, so let’s take the decimal number 8.75 as an example and analyze the values of the variables in the formula. First convert 8.75 to binary, where the integer part 8 corresponds to binary 1000. The specific steps from decimal to binary are: multiply the number by 2, and take out the integer part as the first bit of binary representation; Then multiply the decimal part by 2, taking the integer part as the second bit of the binary representation; And so on, until the decimal part is zero. Therefore, the conversion process of 0.75 to binary is as follows:

0.75 * 2 = 1.5 1 / / record
0.5 * 2 = 1 1 / / record
// the binary equivalent of 0.75 is 11
Copy the code

In the end8.75The corresponding binary is1000.11Is expressed by scientific notation as, where 1 is omitted,M=00011.E = 3. Therefore,E=3+1023=1026. The final formula becomes:.

In the definition of mantras, there is a concept that the part beyond the automatic rounding zero. I don’t know if you have noticed that the rounding rules of IEEE754 floating point numbers are similar to the rounding rules we know, but there are some differences.

Rounding rules of IEEE754 specification

IEEE754 uses a rounding rule for floating point numbers that is sometimes referred to as the last even number.

  • First, the accuracy loss (highest priority) is judged, and both up and down are calculated. The one with the least accuracy loss wins, which is the “nearest” principle.
  • If the distances are equal (that is, the accuracy loss is equal), then even numbers are judged and even wins.

For example, suppose that the binary decimal 1.01101 is rounded to four decimal places. First of all, the accuracy lost up and down is 0.00001(binary). In this case, the second rule guarantees that the least significant bits after rounding are even, so round down is performed and the result is 1.0110. If it is rounded to 2 decimal places, it is rounded up and loses 0.00011 precision, and rounded down and loses 0.00101 precision, so it is 1.10. Think again and look at the examples below for reasons that will be explained later.

 Math.pow(2.53) / / 9007199254740992
 Math.pow(2.53) + 1 / / 9007199254740992
 Math.pow(2.53) + 2 / / 9007199254740994
 Math.pow(2.53) + 3 / / 9007199254740996
Copy the code

Now that you know what floats are made of and the rounding rules for mantras, let’s take a look at why floating-point numbers have lost precision.

Accuracy loss problem

By accepting the mantissa of floating-point numbers, you might be wise enough to see why precision is lost. It is the rounding rule that causes the loss of precision in floating-point numbers.

In the component of floating point numbers, we’ve seen how to convert a decimal number into a binary number. Don’t know if you have noticed that we just said the number multiplied by 2, remove the integer part as the first bit of binary representation, and so on, until the decimal portion is 0, but there is another special case is the decimal part in circulation, can’t stop, this time with limited bits cannot be accurately expressed a decimal, this also is the reason why accuracy loss.

We express 0.1 as the corresponding binary by multiplying it by 2 to get the integer bits:

// 0.1 binary calculus is as follows
0.1 * 2 = 0.2 // Take the integer bit to record 0
0.2 * 2 = 0.4 // Take the integer bit to record 00
0.4 * 2 = 0.8 // Take the integer bit to record 000
0.8 * 2 = 1.6 // Take the integer to record 0001
0.6 * 2 = 1.2 // Take the integer to record 00011
0.2 * 2 = 0.4 // Take the integer to record 000110
0.2 * 2 = 0.4 // Set the integer to 0001100
0.4 * 2 = 0.8 // Take the integer to record 00011000
0.8 * 2 = 1.6 // Record 000110001 as an integer
0.6 * 2 = 1.2 // Take the integer to record 0001100011.// And so on
0.1 = 0.0001100110011001.Copy the code

We end up with an infinite loop of a binary decimal0.0001100110011001…According to the formula of floating point numbers,., drop the first 1, and round off the 52 bits M=00011001100… After 11010, converted to a decimal 0.100000000000000005551115123126, so the precision has been lost. Meanwhile, it can be seen from the above transformation process that 0.2, 0.4, 0.6 and 0.8 cannot be accurately expressed.Of the nine decimals from 0.1 to 0.9, only 0.5 can be expressed accurately in binary.

Let’s continue with a question:

0.1 + 0.2= = =0.3 // false
var s = 0.3 
s === 0.3 // true
Copy the code

Why 0.3 === 0.3 and 0.1 + 0.2! = = 0.3

// Both 0.1 and 0.2 are converted to binary
0.00011001100110011001100110011001100110011001100110011010 +
0.0011001100110011001100110011001100110011001100110011010 =
0.0100110011001100110011001100110011001100110011001100111

// The decimal value is exactly 0.30000000000000004
Copy the code

As you can see, since neither 0.1 nor 0.2 can be accurately represented, the accuracy of 0.1 and 0.2 is lost before the addition operation. The accuracy of floating-point numbers is lost in every expression, not just the expression’s evaluation.

To use a simple mathematical analogy, calculate 1.7+1.6, rounding to the whole number:

1.7 + 1.6 = 3.3 = 3
Copy the code

In another way, round first and then evaluate:

1.7 + 1.6 = 2 + 2 = 4
Copy the code

By doing two operations, we get two results 3 and 4. Similarly, in our floating-point operation, the two numbers 0.1 and 0.2 have lost their precision, so their sum is no longer 0.3.

Why do you get 0.3 if you can’t say it exactly

let i = 0.3;
i === 0.3 // true
Copy the code

Why does x=0.3 give me 0.3

First, the 0.3 you see is not the 0.3 you think it is. Since the mantissa has a fixed length of 52 digits, plus the omitted digit, the most that can be represented is, which is very close to the accuracy of 16 decimal digits.

For example, 0.30000000000000000000055 is the same as 0.30000000000000000051. Both numbers are stored in 64-bit double-precision floating-point format as 0.1.

0.3000000000000000055= = =0.3 // true
0.3000000000000000055= = =0.3000000000000000051 // true
Copy the code

As you can see above, there are 17 digits of the integer + decimal part in a double – precision floating-point.

When the mantissa length is 16, you can use toPrecision(16) to calculate the accuracy, the exceeding accuracy will be automatically rounded. Such as:

(0.10000000000000000555).toPrecision(16) / / return 0.1

(0.1).toPrecision(21) / / 0.100000000000000005551
Copy the code

Why is [-(2^53-1), 2^53-1] a safe integer region

In JavaScriptNumberThere are two static propertiesMAX_SAFE_INTEGERandMIN_SAFE_INTEGER, respectively representing the largest safe integer numbers () and the smallest safe integer number ().

Safe integers mean that integers in this range correspond to double-precision floating-point numbers. There is no such thing as multiple floating-point numbers for an integer, and there is certainly no such thing as multiple integers for one floating-point number. So where do these two numbers come from?

Regardless of the sign and exponent bits, the mantissa of a floating point number is 52 bits, excluding the omitted 1, and the largest binary decimal that can be represented is1.11111... (1) 52, calculate the value of this number, where the integer bit is1The corresponding decimal value is, the value of the decimal place isIt’s a common ratioAnd we know that the formula for summation of geometric sequences is (I won’t go back to my high school textbook)


The sum formula approximates 0.99999999999998, which adds up to 1.9999999999999998 infinitely close to 2.

Now the exponent place, we’ve already said thatThe exponent digit indicates how many digits the decimal point moves to produce the mantissa, increasing each time the decimal point moves forward, when the exponent increases to 52, it is filled with decimal places, and the corresponding value is52 * 2 ^ (1.111111... (52))The corresponding decimal integer number is infinitely closeIs the.

At the same time, when the exponent bit is 23, it can also clearly indicate an integer, and the corresponding expression is, the maximum safe integer is clearly attainableNot the aboveAh. Don’t worry. Let’s keep going. Let’s seeThe value of the. It is first converted to the corresponding binary, where the mantissa is1.000... 1 (52 0), since bit-64 floating-point numbers can only store 52 mantissa digits, the last digit 1, according toIEEE floating point rounding rules, round down, the precision is lost. The lastAnd these two numbersThe result is the same when stored in 64-bit double-precision floating-point format.

Math.pow(2.53) / / 9007199254740992
Math.pow(2.53) = = =Math.pow(2.53) + 1  // true
Copy the code

Said earlierSafe integers mean that integers in this range correspond to double – precision floating – point numbers, and this is not a one-to-one correspondence relationship, soIs a safe integer region.

And finally, the smallest safe integer is the sign bit.

Moving on, this is just a safe zone, and it doesn’t mean that the largest integer that a floating-point number can store exactly isThese are two concepts. Let’s seeStores the result in 64-bit double precision floating-point format, where the mantissa is1.000.. 1 (51 0), can be completely stored without loss of precision, continue to see, the mantissa of the corresponding binary is1.00.. 11 (51 0)According to the rounding rule, round up, the result is1.00.. 50 (0) 10. This corresponds to the result mentioned above:

Math.pow(2.53) + 1 / / 9007199254740992
Math.pow(2.53) + 2 / / 9007199254740994
Math.pow(2.53) + 3 / / 9007199254740996
Copy the code

And if you’re interested, you can go on to the point where the exponent is 54, and so on. It can be seen that IEEE can represent more than the maximum integer, more than this value can also be expressed, but need to pay attention to the accuracy of the problem, when using need to be careful.

subsequent

For floating point pitfalls and solutions, check out this article JavaScript floating point traps and solutions.

The appendix

JavaScript floating point traps and solutions

The riddle of the code

Rounding scheme of IEEE754 specification