preface

What you need to know before reading this article:

Conversion between bases The storage mechanism of e-floating-point types in scientific notation (single-precision floating-point, double-precision floating-point)

The most accessible explanation

For example, a number 1÷3=0.33333333…… We all know that 3 will always be infinite loop, mathematics can express, but the computer needs to store, easy to take out and reuse next time, but 0.333333…… How do you tell the computer to store this number in an infinite loop? A computer can’t hold all the memory it has, can it? So you can’t store a value relative to the math, you can only store an approximate value, so when the computer stores it and then takes it out, there will be accuracy problems.

Basic knowledge you need to know

There are ten kinds of people in the world: those who understand binary and those who don’t

binary

Cardinality of 2
There are two numbers, zero and one
2 in 1

octal

Cardinality of eight
It consists of eight digits: 0, 1, 2, 3, 4, 5, 6, and 7
With 8 to 1

The decimal system

We use the decimal system in everyday life, which means to count ten into one

hexadecimal

The base is 16.
It consists of 16 numeric symbols, namely 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, and F
16 into 1

In ancient China, the unit of weight used at that time was hexadecimal, 16 liang for 1 catty, so there is the so-called “six jack eight two”

For example, the decimal system 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11… When you want to count 10, you have to carry 1 digit, which is the ten digit 1, and the units digit 0, which is the full ten binary 0, 1, 10, 11, 10, 11, 110, 111, 101… When I’m counting 2, I’m going to carry 1 bit, I’m going to write 1 bit, I’m going to write 0 bit, I’m going to write 0 bit. If not, baidu bar put a link to the conversion tool

C.runoob.com/front-end/5…

E in scientific notation

I’ve got an exponent. I’ve got an exponent. I’ve got an exponent

For example, 7.823E5 = 782300, where E5 is 10 to the fifth power, and for example, 54.3E-2 = 0.543, where E-2 is 10 to the minus 2 power, and one more thing: The negative power of a number is the inverse of the power of that number. For example: 2 to the minus 1 = 2 to the 1 1/2 = 1/2 For example: 3 to the minus 2 = 3 to the 2 1/9 = why scientific notation, we’ll talk about it later

Triple floating-point storage mechanism (single floating-point, double floating-point)

Floating-point data types include single-precision float and double

Single-precision float

The single-precision floating point number occupies 4 bytes in memory, the effective number is 8 bits, and the range of representation is -3.40e +38 ~ +3.40E+38

Floating point number (double)

The double precision floating point number occupies 8 bytes in memory, the significant number is 16 bits, and the range of representation is -1.79e +308 ~ +1.79E+308

Floating-point constants can be represented in two ways:

Decimal number: consists of digits and decimal points, such as 0.123 and 123.0
Scientific notation :123e3 or 123e3, where e or e must be preceded by a number and the exponent after e or e must be an integer (including negative integers, of course)

Floating-point is simply a representation of data with a decimal point, and because the decimal point can float in different positions in the corresponding binary, it may thus be defined as floating-point. Have to admire this level of literacy, the definition of a data name is so deep

But!!

JavaScript stores decimals differently than other languages like Java and Python. In JavaScript, there is only one type for all numbers including integers and decimals. The Number type is implemented in accordance with the IEEE 754 standard. We don’t have to worry about that, just remember that javascript stores all of the Number types as 64-bit double-precision floating-point numbers, which means computers store at most 64-bit binary numbers.

The advantage of this storage structure is that you can normalize integers and decimals, saving storage space. It doesn’t matter, let’s move on. For an integer, we can easily convert between decimal and binary, but for a floating point number, it’s not so easy — because of the decimal point. For floating-point numbers, the position of the decimal point is not fixed (the number of digits after the decimal point varies), so storing the decimal point is a challenge. Then people came up with scientific notation for floating point numbers like 1.012^4, which had the advantage that the decimal point was fixed in place. Because computers can only be represented in binary (0 or 1), binary is converted to scientific notation by the following formula: X= A2 ^e

A is the binary representation of a floating point number, 0 or 1, and e is the number of digits moved by the decimal point

Example: 27.0 in binary is: 11011.0 in scientific notation is

1.10110 * 2 ^ 4

So how do you store 1.10110*2^4? For a double, the length is 8 bytes. The 52 bits on the right represent the number behind the decimal point. The 11 bits in the middle represent e(exponent), the number of digits moved by the decimal point.

Illustration:

The 1 bit is used to represent the sign bit
The 11 digits are used to indicate the exponent
The 52 digits represent the decimal part

Sign bit: 1 represents a positive number, 0 represents a negative number

For example, -0.896 is negative and the sign bit is 0. 0.123 is a positive number and the sign bit is

Exponential bit: because e can be positive or negative. For example, if 1.101102 to the fourth e is positive, if it’s 0.101 then it’s 1.012 to the minus 1, so e is minus 1. And we’re going to take the e+ exponential offset, and we’re going to convert that into binary, which is our exponential bit

Decimal part (also called order) : The number after the decimal point after the binary is converted down to scientific notation. If the decimal place of 1.1011^4 is 1011, its total number of digits should be 52. If you don’t have enough digits, you can fill them with zeros so you can also think of the offset of the decimal point as 52 digits at most. If you take the dot symbol and exponent, then this 52 digit represents the largest integer with 52 digits. The largest integer that can be accurately represented in JS is Math.pow(2, 53) decimal, 9007199254740992

Exponential offset: Exponential offset formula:

X=2^(k-1)

K is the number of exponent bits (as shown in the figure above, the exponent bits of a double-precision floating-point number are 11, using the formula X=2^{11-1} =1023).

So double – precision floating – point numbers have an exponential offset of 1023

That is: computer storage binary is: sign bit + index bit + decimal part (order)

Example: 27.5 converts to binary 11011.1

11011.1 converts to scientific notation 1.10111*2^4

Sign bit is 1(positive)

The exponent bit is 4+ the exponent offset 1023 which is 1027 because it is decimal needs to be converted to binary which is 10000000011

1011 1000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000

The exponential offset is 2^{11-1} =1023

So 27.5 is stored in the standard binary form of the computer

Sign bit + exponent bit + fractional part (order)

0+10000000011+011 10000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000

That is:

0100 0000 0011 1011 1000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000

Just 64

Note the distinction: converting decimal to binary only preserves 52 significant digits (scientific notation), while computers store double precision floating point numbers as 64 bits

Again, the computer stores a number of 27.5

First, convert this number to binary 11011.1

Convert binary to scientific notation 1.10111*2^4

And because JS stores digits with a double precision floating point number [up to 64 bits], namely sign bit [1] + index bit [4+1023(fixed offset)=> 10000000011] + decimal part [10111(52 bits are not enough to be filled with 0)]

0100 0000 0011 1011 1000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000

Don’t we store the whole number before the decimal point?

No, because the first digit is always 1 when you convert it to binary, the computer handles it automatically

Ok~ now let’s get back to the point!!

Look at some typical problems with JS digital precision loss

/ / add = = = = = = = = = = = = = = = = = = = = =
0.1 + 0.2 = 0.30000000000000004
0.7 + 0.1 = 0.7999999999999999
0.2 + 0.4 = 0.6000000000000001

/ / subtraction = = = = = = = = = = = = = = = = = = = = =
1.5 - 1.2 = 0.30000000000000004
0.3 - 0.2 = 0.09999999999999998
 
/ / the multiplication = = = = = = = = = = = = = = = = = = = = =
19.9 * 100 = 1989.9999999999998
0.8 * 3 = 2.4000000000000004
35.41 * 100 = 3540.9999999999995

/ / division = = = = = = = = = = = = = = = = = = = = =
0.3 / 0.1 = 2.9999999999999996
0.69 / 10 = 0.06899999999999999
Copy the code

Why 0.1 +0.2===0.30000000000000004?

Let’s derive 0.1

0.1>>> binary >>> Scientific notation >> actual storage form [64-bit] sign bit +(exponential bit + exponential offset)+ decimal part) namely:

0.0001100110011001100110011001100110011001100110011001101 0.1 > > > > > > 1.100110011001100110011001100110011001100110011001101 * 2 ^ (4) > > >

0011111110111001100110011001100110011001100110011001100110011010

By the same token, 0.2

0.2>> binary >>> Scientific notation >> actual storage in the form of 64-bit i.e

1.100110011001100110011001100110011001100110011001101 * 0.001100110011001100110011001100110011001100110011001101 0.2 > > > > > > 2 ^ (3) > > > 0011111111001001100110011001100110011001100110011001100110011010

And you can see that when you convert to binary

0.1 >>> 0.0001 1001 1001 1001... (1001 infinite loop)

0.2 >>> 0.0011 0011 0011 0011... (0011 Infinite loop

Just like some irrational numbers cannot be expressed in a finite way, such as PI 3.1415926… , 1.3333… And so on, in the form of binary scientific notation only retain 64 valid digits, this can only imitate decimal rounding, but binary only 0 and 1 two, so it becomes 0 rounding. If something goes wrong in this step, it goes wrong step by step, and it is only natural that there will be errors in the computer’s memory of decimals. This is part of the floating point calculation in the computer error, which is the root cause of the loss of accuracy

Expand the binary forms of 0.1 and 0.2 as they actually are, add zeros at the end, The results below 0.00011001100110011001100110011001100110011001100110011010 + 0.00110011001100110011001100110011001100110011001100110100 = 0.01001100110011001100110011001100110011001100110011001110

0.1+0.2 >> 0.0100 1100 1100 1100... (1100 infinite loop)

Is 0.1 + 0.2 the result of a binary number in scientific notation is expressed as 1.001100110011001100110011001100110011001100110011010 * 2 ^ (2), omit the tail end of 0, The 1.00110011001100110011001100110011001100110011001101 * 2 ^ (2), Therefore (0.1 + 0.2) actually stored in the form of is 0011111111010011001100110011001100110011001100110011001100110100 by computer storage digits limit truncated binary Numbers, and then converted to a decimal, Is 0.30000000000000004

Completion is derived

Conclusion: The computer stores a double-precision floating-point number by converting the decimal number into binary scientific notation. The computer then stores the binary scientific notation in its own rule {sign bit +(exponential bit + binary of exponential offset)+ decimal part}, because there is a bit limit (64-bit) for storage. And some decimal floating-point numbers will loop indefinitely when converted to binary, causing binary rounding (0 rounding 1), and causing calculation errors when converted to decimal.

Common precision loss scenario analysis

Why does x=0.1 get 0.1?

When storing binary, the offset of the decimal point is up to 52 bits, and the decimal number that can be represented at most is 9007199254740992. The mantis of the corresponding scientific count is 9.007199254740992, which is also the precision that JS can represent at most. Its length is 16, so you can use toPrecision(16) to calculate the accuracy, JS automatically do this part of the processing, exceeding the accuracy will be automatically rounded. So we have:

0.10000000000000000555.toPrecision(16) //0.1000000000000000 = 0.1 after the trailing zeros are removed
Copy the code

But the 0.1 you see is not actually 0.1. Do not believe you can use a higher accuracy try:

0.1.toPrecision(21) =0.100000000000000005551
Copy the code

The crisis of large number

9999999999999999 == 10000000000000001===true ?

6 and 17 digits are actually the same

The precision loss of a large integer is essentially the same as that of a floating point number. As mentioned above, the offset of the decimal point is up to 52 bits when storing binary data, and the number of binary data that can be stored is 62 bits, which is rounded off. Computers store binary data, so the largest integer that can be accurately represented in JS is Math.pow(2, 53). If 9007199254740992 is greater than 9007199254740992, the precision may be lost

Large numbers are converted to binary

9007199254740992 > > 10000000000000… 000 accuracy is not lost
9007199254740992 + 1 >> 10000000000000… 001 precision loss
9007199254740992 + 2 >> 10000000000000… 010 accuracy is not lost

Above, we can see that the seemingly finite number is infinite in the binary representation of the computer. Due to the memory bit limit, there is “truncation”, and precision loss occurs. At this time, two large numbers in the binary positive number of the computer should be equal

Therefore, 99999999999999 == 10000000000000001===true

This is also the case with parseInt(), which converts a string to an integer in the same way as a floating-point number, except that parseInt() may lose precision if it is greater than 9007199254740992

In the early order system of Taobao, the order number was treated as a number. Later, the random order number increased rapidly, exceeding 9007199254740992. The final solution is to change the order number into a string. To solve the problem of large numbers you can refer to the third party library bignumber.js, the principle is to treat all numbers as strings, re-implement the calculation logic, the disadvantage is that the performance is much worse than the native. Therefore, native support for large numbers is very necessary. Now TC39 has a Stage 3 proposal, Proposal Bigint, which completely solves the problem of large numbers

Tofixed () for incorrect carry when the last decimal place is 5

// toFixed compatibility problem in Firefox/Chrome
1.35.toFixed(1) / / 1.4 is correct
1.335.toFixed(2) / / 1.33 error
1.3335.toFixed(3) / / 1.333 error
1.33335.toFixed(4) / / 1.3334 is correct
1.333335.toFixed(5)  / / 1.33333 error
1.3333335.toFixed(6) / / 1.333333 error
Copy the code

As you can see, rounding is correct if the number of decimal places is 2,5, and the rest is wrong. Firefox and Chrome implementations are fine, but the root cause is a loss of precision in floating-point numbers in computers

For example: 1.005.toFixed(2) returns 1.00 instead of 1.01.

Cause: The actual corresponding number of 1.005 is 1.00499999999999989, which is all rounded off

1.005.toPrecision(21) / / 1.00499999999999989342
Copy the code

Repair Mode 1

/* * Fix the toFixed compatibility issue in Firefox/Chrome * * Fix the toFixed compatibility issue in Firefox/Chrome * * Fix the toFixed compatibility issue in Firefox/Chrome * * Fix the toFixed compatibility issue in Firefox/Chrome Then call toFixed number {original number} precision {number} */
function toFixed(number, precision) {
    var str = number + ' '
    var len = str.length
    var last = str.substring(len - 1, len)
    if (last == '5') {
        last = '6'
        str = str.substring(0, len - 1) + last
        return (str - 0).toFixed(precision)
    } else {
        return number.toFixed(precision)
    }
}
console.log(toFixed(1.333335.5))
Copy the code

Repair Mode 2

// Zoom out first
function toFixed(num, s) {
    var times = Math.pow(10, s)
    // 0.5 for rounding
    var des = num * times + 0.5
    // Divide the decimals
    des = parseInt(des, 10) / times
    return des + ' '
}
console.log(toFixed(1.333332.5))
Copy the code

ES6 adds a tiny constant to the Number object, number.epsilon

Number.EPSILON
/ / 2.220446049250313 e-16
Number.EPSILON.toFixed(20)
/ / "0.00000000000000022204"
Copy the code

The purpose of introducing such a small quantity is to set a margin of error for floating-point calculations, and if the error can be less than Number.EPSILON, the result can be considered reliable.

Error checking function (from “INTRODUCTION to ES6 Standards” – Ruan Yifeng)

function withinErrorMargin (left, right) {
    return Math.abs(left - right) < Number.EPSILON
}
withinErrorMargin(0.1+0.2.0.3)
Copy the code

Data display scenario processing

When you have data like 1.4000000000000001 to display, it is recommended to use toPrecision and parseFloat to convert it to a number, as follows:

parseFloat(1.4000000000000001.toPrecision(12= = =))1.4  // true
Copy the code

The encapsulation method is:

function strip(num, precision = 12) {
  return +parseFloat(num.toPrecision(precision));
}
Copy the code

Why choose 12 as the default precision?

This is a rule of thumb choice, generally choosing 12 will solve most 0001 and 0009 problems, and is sufficient in most cases, you can increase if you need more precision.

That is, you can’t use toPrecision for operations like +-*/. The correct way is to convert decimals into whole numbers and then operate. 【 Expand and then shrink 】

Back to floating point calculations!

We can upgrade the number we need to calculate (multiplied by 10 to the NTH power) to an integer that the computer can recognize accurately, and then degrade it (divided by 10 to the NTH power) after the calculation is complete. This is a common method used in most languages to deal with precision problems. For example:

0.1 + 0.2= =0.3 //false
(0.1*10 + 0.2*10) /10= =0.3 //true
(0.1*100 + 0.2*100) /100= =0.3 //true

35.41 * 100= =3540.9999999999995 // true
// There is still a loss of precision even if you expand and shrink
(35.41*100*100) /100= =3541 //false  
Math.round() = math.round () = math.round ()
 Math.round(35.41 * 100) = = =3541 //true
Copy the code

Conclusion: The problem of lost precision can not be solved simply by enlarging and shrinking method. IndexOf (“.”) can be used to record the length of the digits behind the decimal point of two values, compare them, take the maximum value (i.e., how many times it has increased), and then reduce it back after the calculation.

Let’s start with a simple example

// Add
function add(num1, num2) {
  const num1Digits = (num1.toString().split('. ') [1] | |' ').length
  const num2Digits = (num2.toString().split('. ') [1] | |' ').length
  const baseNum = Math.pow(10.Math.max(num1Digits, num2Digits))
  return (num1 * baseNum + num2 * baseNum) / baseNum
}
Copy the code

The above method can be applied to most scenarios. A little extra processing is needed for scientific notation like $2.3e+1.

Because when the number exceeds 21 digits, the number is forced into scientific notation.

The example above applies only to addition and subtraction. There’s a problem with multiplication and division. There’s another translation rule for multiplication and division, but essentially it’s still scale-up, but the logic has changed.

The solution

The complete code to handle addition, subtraction, multiplication and division is as follows:

/** * floatObj contains four methods: addition, subtraction, multiplication, and division. The root cause is binary and implement the bit limit some numbers can not be finite representation * the following is the decimal corresponding binary representation * 0.1 >> 0.0001 1001 1001 1001... (1001 infinite loop) * 0.2 >> 0.0011 0011 0011 0011 0011... (0011 Infinite loop) * Computers store each data type with a finite width. For example, JavaScript uses 64 bits to store numeric types, so anything beyond that will be discarded. The missing part is the missing part. ** ** method ** * add/subtract /divide ** ** explame ** * 0.1 + 0.2 == 0.30000000000000004 0.00000000000004) * 0.2 + 0.4 == 0.6000000000000001 (more than 0.0000000000001) * 19.9 * 100 == 1989.9999999999998 (less Add (0.1, 0.2) === 0.3 * floatObj.multiply(19.9, 100) === 1990 * */
        var floatObj = function () {

            /* * check whether obj is an integer that is equal to itself. Use this feature to determine if it is an integer */
            function isInteger(obj) {
                // Or use number.isinteger ()
                return Math.floor(obj) === obj
            }
            /* * Converts a floating point number to an integer, returning integer and multiple. For example, 3.14 >> 314, the multiple is 100 * @param floatNum {number} decimal * @return {object} * {times:100, num: 314} */
            function toInteger(floatNum) {
                // Initializes the number and precision times The precision multiple num is the converted integer
                var ret = { times: 1.num: 0 }
                var isNegative = floatNum < 0  // Is it a decimal number
                if (isInteger(floatNum)) {  // Is an integer
                    ret.num = floatNum
                    return ret  // An integer is returned directly
                }
                var strfi = floatNum + ' '  // Convert to a string
                var dotPos = strfi.indexOf('. ')
                var len = strfi.substr(dotPos + 1).length // Get the number of digits after the decimal point
                var times = Math.pow(10, len)  // Multiple of precision
                /* Why add 0.5? Abs (0.16344556) * 100000000=0.16344556*10000000= 16344556*10000000=1634455.5999999999 0.0000000001 plus 0.5 0.16344556*10000000+0.5=1634456.0999999999 parseInt accuracy problem after multiplication is corrected */
                var intNum = parseInt(Math.abs(floatNum) * times + 0.5.10)
                debugger
                ret.times = times
                if (isNegative) {
                    intNum = -intNum
                }
                ret.num = intNum
                return ret
            }

            /* * * core method, realize the addition, subtraction, multiplication and division operation, to ensure that the accuracy of * thought: to enlarge the decimal to integer (multiply), arithmetic operation, and then reduce to decimal (divide) * @param a {number} operand 1 * @param b {number} operand 2 */
            function operation(a, b, op) {
                var o1 = toInteger(a)
                var o2 = toInteger(b)
                var n1 = o1.num  / / 3.25 + 3.153
                var n2 = o2.num
                var t1 = o1.times
                var t2 = o2.times
                var max = t1 > t2 ? t1 : t2
                var result = null
                switch (op) {
                    // Add and subtract according to the multiple relationship
                    case 'add':
                        if (t1 === t2) { // Both decimal multiples are the same
                            result = n1 + n2
                        } else if (t1 > t2) {
                            // O1 is greater than O2
                            result = n1 + n2 * (t1 / t2)
                        } else {  // O1 is less than O2
                            result = n1 * (t2 / t1) + n2
                        }
                        return result / max
                    case 'subtract':
                        if (t1 === t2) {
                            result = n1 - n2
                        } else if (t1 > t2) {
                            result = n1 - n2 * (t1 / t2)
                        } else {
                            result = n1 * (t2 / t1) - n2
                        }
                        return result / max
                    case 'multiply':
                        // 325*3153/(100*1000) 100 times larger ==> 100 times smaller
                        result = (n1 * n2) / (t1 * t2)
                        return result
                    case 'divide':
                        // (325/3153)*(1000/100) 100 times smaller ==> 100 times larger
                        result = (n1 / n2) * (t2 / t1)
                        return result
                }
            }

            // Add, subtract, multiply, divide
            function add(a, b) {
                return operation(a, b, 'add')}function subtract(a, b) {
                return operation(a, b, 'subtract')}function multiply(a, b) {
                return operation(a, b, 'multiply')}function divide(a, b) {
                return operation(a, b, 'divide')}return {
                add: add,
                subtract: subtract,
                multiply: multiply,
                divide: divide
            }
        }();
        console.log(floatObj.add(0.16344556.3.153))
Copy the code

If you find this call cumbersome, you can also add the corresponding operator to Number.prototype. I’m not going to show you.

Of course, you can also solve this problem with mature libraries

For example math.js can be downloaded from CDNJS or linked to:

Cdnjs.cloudflare.com/ajax/libs/m…

Number-precision also supports addition, subtraction, multiplication, division, and rounding of floating point numbers. Very small at 1K, much smaller than most of its peers (math.js, BigDecimal.js)

Github.com/dt-fe/numbe…

Finish,

If you find this article rewarding, click follow. Thanks!!

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Accuracy loss issue – just read this article (Easy to understand)

The most accessible explanation

Basic knowledge you need to know

binary

octal

The decimal system

hexadecimal

E in scientific notation

Triple floating-point storage mechanism (single floating-point, double floating-point)

Single-precision float

Floating point number (double)

Why 0.1 +0.2===0.30000000000000004?

Common precision loss scenario analysis

Why does x=0.1 get 0.1?

The crisis of large number

Tofixed () for incorrect carry when the last decimal place is 5

Repair Mode 1

Repair Mode 2

Data display scenario processing

Why choose 12 as the default precision?

The solution

Accuracy loss issue – just read this article (Easy to understand)

The most accessible explanation

Basic knowledge you need to know

binary

octal

The decimal system

hexadecimal

E in scientific notation

Triple floating-point storage mechanism (single floating-point, double floating-point)

Single-precision float

Floating point number (double)

Why 0.1 +0.2===0.30000000000000004?

Common precision loss scenario analysis

Why does x=0.1 get 0.1?

The crisis of large number

Tofixed () for incorrect carry when the last decimal place is 5

Repair Mode 1

Repair Mode 2

Data display scenario processing

Why choose 12 as the default precision?

The solution

Related Posts

Vue-jsx Component Development Notes

Based on proxy to achieve the simplest two-way binding (less code thief. JPG

Lossless scaling in canvas