preface

As an advanced front-end programmer, data accuracy must not go wrong. So what should we know about THE JS precision problem?

  • Why accuracy distortion occurs in JS floating-point operation?

  • What are the solutions?

To solve a problem, we need to get to the root of the problem

why

To know why, you must know the JS standard and rules for floating-point arithmetic. Javascript floating-point arithmetic is based on IEEE 754 standard.

IEEE 754

IEEE binary Floating-point arithmetic standard IEEE 754 is the most widely used floating-point arithmetic standard since the 1980s.

IEEE 754 specifies four ways to represent floating point values: single precision (32 bits), double precision (64 bits), extended single precision (43 bits above, rarely used), and extended double precision (79 bits above, usually implemented as 80 bits). Javascript uses double precision (64-bit) floating-point arithmetic rules.

    

What are the storage and computing rules of IEEE754?

A floating point number is represented on a computer as:

Value = sign x exponent x function

That is, the actual value of a floating point number is equal to the sign bit times the offset of the exponent times the fraction value.

  • Sign: The sign bit whose most significant bit is specified as positive and negative, with 0 representing a positive number and 1 representing a negative number

  • Exponent: indicates an offset value equal to the value of the exponent plus a fixed value. Fixed value: 2^e – 1, where e is the length of the storage index, such as 8 for 32-bit and 11 for 64-bit

  • Fraction: the mantissa, which can be understood as the decimal part. The excess part automatically goes into a bunch of zeros.

S is a sign bit, Exp is an exponential word, and Fraction is a significant number. The exponent part is expressed in what is called a skewed positive, which is the sum of the actual exponent with a fixed value (1023 in the 64-bit case). The purpose of this representation is to simplify the comparison. Because the value of the exponent may be positive or negative, if the complement is used, all the sign bits of S and Exp themselves will lead to a simple size comparison. Because of this, the exponential part is usually stored as an unsigned positive value. The exponential part of the double precision is −1022 to +1023 plus 1023, and the exponent value ranges from 1 to 2046 (0 (all 2 carries are 0) and 2047 (all 2 carries are 1) are special values). When a floating-point decimal is calculated, the value of the exponent minus the partial value will be the actual exponent size.

Number of operators exponent mantissa
32 – 1 8 23
A 64 – bit 1 11 52

Let’s take an example 🌰, converting a decimal number to a 32-bit floating point representation of IEEE754.

For example: 178.125

Step 1: Convert the whole and fractional parts to binary

  1. The integer part is obtained by dividing by 2 and mod: 10110010
  2. The decimal part is multiplied by 2 and rounded to get: 001
  3. Together: 10110010.001

Step 2: Convert to a binary floating point number.

That is, move the decimal point to a one-digit integer: 1.0110010001 * 2^111. We moved 7 bits to the left to get 111 binary.

The third step: determine the sign bit, order code, mantissa

  1. Number character: 0 because floating point numbers are positive.(negative numbers are 1)

  2. Order code: The order is 111 in binary (7 in decimal). For single-precision floating-point numbers, the offset is 01111111(127) 2^(E-1)-1, e is the bit of the order code, that is, 8, so the offset is 127], that is, 111+01111111 = 10000110

  3. Mantissa: The number after the decimal point, i.e. 0110010001

    

One question might be: what happened to the 1 in front of the decimal point? Because the mantissa part is normalized representation, the highest bit is always “1”, so this is directly hidden, but also to save a decimal place to store, improve accuracy.

Floating point operation

So why does 0.1 + 0.2 have accuracy distortion?

First, decimal 0.1 and 0.2 are converted to binary, but since floating point numbers are infinite in binary:

0.1 -> 0.0001 1001 1001 1001... (1100 loop) 0.2 -> 0.0011 0011 0011 0011... Circulate (0011)Copy the code

The EEE 754 standard 64-bit double precision floating-point number supports a maximum of 53 binary bits in the decimal part, so when the two are added together, the binary is:

0.0100110011001100110011001100110011001100110011001100 
Copy the code

A binary number truncated by the limitation of a floating-point decimal place, converted to decimal, is 0.30000000000000004. So there are errors when you do the arithmetic.

The solution

There are several well-worn solutions:

Using toFixed

The easiest way to handle decimals is to use toFixed:

(0.1 + 0.2). ToFixed (1) = '0.3'Copy the code

Although this is simple, there are some problems with inaccurate results.

1.35.toFixed(1) / / 1.4 is correct
1.335.toFixed(2) / / 1.33 error
1.3335.toFixed(3) / / 1.333 error
1.33335.toFixed(4) / / 1.3334 is correct
1.333335.toFixed(5)  / / 1.33333 error
1.3333335.toFixed(6) / / 1.333333 error
Copy the code
Reduced integer operation

The idea is to convert decimals into whole numbers and then compute them.

/** ** keep the NTH ** after the decimal point@param X is the number of approximations *@param N To the NTH decimal place *@returns Approximate processed number */
function roundFractional(x, n) {
  return Math.round(x * Math.pow(10, n)) / Math.pow(10, n);
}
Copy the code

Math.round = math.round = math.round = math.round = math.round

Not only did it not get bigger, it made sure it was a whole number. If you want to forensics down, use the math.floor function.

Turn the string

Most third-party libraries are packaged based on this method and support large number processing. Recommended.

/*** Method *** add/Subtract /divide * Floatobj.add (0.1, 0.2) >> 0.3 * floatobj.multiply (19.9, 100) >> 1990 * */
var floatObj = function() {

    /* * check whether obj is an integer */
    function isInteger(obj) {
        return Math.floor(obj) === obj
    }

    /* * Converts a floating point number to an integer, returning integer and multiple. For example, 3.14 >> 314, the multiple is 100 * @param floatNum {number} decimal * @return {object} * {times:100, num: 314} */
    function toInteger(floatNum) {
        var ret = {times: 1.num: 0}
        if (isInteger(floatNum)) {
            ret.num = flatNum
            return ret
        }
        var strfi  = floatNum + ' '
        var dotPos = strfi.indexOf('. ')
        var len    = strfi.substr(dotPos+1).length
        var times  = Math.pow(10, len)
        var intNum = Number(floatNum.toString().replace('. '.' '))
        ret.times  = times
        ret.num    = intNum
        return ret
    }

    /* * Core method to achieve addition, subtraction, multiplication and division operations, to ensure no loss of precision * * * @param a {number} 1 * @param b {number} 2 * @param digits {number} Such as 2, which would hold to two decimal places * @ param op {string} operation type, there are subtracting (add/subtract/multiply/divide) * * /
    function operation(a, b, digits, op) {
        var o1 = toInteger(a)
        var o2 = toInteger(b)
        var n1 = o1.num
        var n2 = o2.num
        var t1 = o1.times
        var t2 = o2.times
        var max = t1 > t2 ? t1 : t2
        var result = null
        switch (op) {
            case 'add':
                if (t1 === t2) { // The two decimal places are the same
                    result = n1 + n2
                } else if (t1 > t2) { // O1 is greater than O2
                    result = n1 + n2 * (t1 / t2)
                } else { // O1 is less than O2
                    result = n1 * (t2 / t1) + n2
                }
                return result / max
            case 'subtract':
                if (t1 === t2) {
                    result = n1 - n2
                } else if (t1 > t2) {
                    result = n1 - n2 * (t1 / t2)
                } else {
                    result = n1 * (t2 / t1) - n2
                }
                return result / max
            case 'multiply':
                result = (n1 * n2) / (t1 * t2)
                return result
            case 'divide':
                result = (n1 / n2) * (t2 / t1)
                return result
        }
    }

    // Add, subtract, multiply, divide
    function add(a, b, digits) {
        return operation(a, b, digits, 'add')}function subtract(a, b, digits) {
        return operation(a, b, digits, 'subtract')}function multiply(a, b, digits) {
        return operation(a, b, digits, 'multiply')}function divide(a, b, digits) {
        return operation(a, b, digits, 'divide')}// exports
    return {
        add: add,
        subtract: subtract,
        multiply: multiply,
        divide: divide
    }
}();
Copy the code
Third party library (bignumber.js) source analysis
x = new Big(123.4567)
 y = Big('123456.7 e-3')                 // 'new' is optional
 z = new Big(x)
 x.eq(y) && x.eq(z) && y.eq(z)
 ​
 ​
 0.3 - 0.1                              / / 0.19999999999999998
 x = new Big(0.3)
 x.minus(0.1)                           / / "0.2"
 x                                      / / "0.3"

 function Big(n) {
  var x = this;
 ​
  // Support function call initialization without using the new operator
  if(! (xinstanceof Big)) return n === UNDEFINED ? _Big_() : new Big(n);
 ​
  // The prototype chain checks to see if the incoming value is already an instance of the Big class
  if (n instanceof Big) {
  x.s = n.s;
  x.e = n.e;
  x.c = n.c.slice();
  } else {
  if (typeofn ! = ='string') {
  if (Big.strict === true) {
  throw TypeError(INVALID + 'number');
  }
 ​
  // Determine if it is -0, if not, convert it to a string.
  n = n === 0 && 1 / n < 0 ? '0' : String(n);
  }
 ​
  // The parse function accepts only string arguments
  parse(x, n);
  }
 ​
  x.constructor = Big;
 }

 function parse(x, n) {
  var e, i, nl;
 ​
  if(! NUMERIC.test(n)) {throw Error(INVALID + 'number');
  }
 ​
  // Check whether the symbol is positive or negative
  x.s = n.charAt(0) = =The '-' ? (n = n.slice(1), -1) : 1;
 ​
  // Check if there is a decimal point
  if ((e = n.indexOf('. '> -))1) n = n.replace('. '.' ');
 ​
  // Determine if it is a scientific notation
  if ((i = n.search(/e/i)) > 0) {
 ​
  // Determine the index value
  if (e < 0) e = i;
  e += +n.slice(i + 1);
  n = n.substring(0, i);
  } else if (e < 0) {
 ​
  // is a positive integer
  e = n.length;
  }
 ​
  nl = n.length;
 ​
  // Determine if the number is preceded by a 0, such as 0123
  for (i = 0; i < nl && n.charAt(i) == '0';) ++i;
 ​
  if (i == nl) {
 ​
  // Zero.
  x.c = [x.e = 0];
  } else {
 ​
  // Identify the 0 after the number, such as 1.230
  for (; nl > 0 && n.charAt(--nl) == '0';) ; x.e = e - i -1;
  x.c = [];
 ​
  // The string is converted into an array for storage, with the preceding and following zeros removed
  for (e = 0; i <= nl;) x.c[e++] = +n.charAt(i++);
  }
 ​
  return x;
 }

 P.plus = P.add = function (y) {
  var t,
  x = this,
  Big = x.constructor,
  a = x.s,
  // All operations are converted into two instances of the Big class for easy processing
  b = (y = new Big(y)).s;
 ​
  // Check whether the symbols are not equal, i.e. one is positive and one is negative
  if(a ! = b) { y.s = -b;return x.minus(y);
  }
 ​
  var xe = x.e,
  xc = x.c,
  ye = y.e,
  yc = y.c;
 ​
  // Determine if a value is 0
  if(! xc[0) | |! yc[0]) return yc[0]? y :new Big(xc[0]? x : a *0);
 ​
  // Make a copy of the group to avoid affecting the original instance
  xc = xc.slice();
 ​
  // Fill in 0 to ensure the same number of digits
  // Note that the reverse function is faster than the unshift function
  if (a = xe - ye) {
  if (a > 0) {
  ye = xe;
  t = yc;
  } else {
  a = -a;
  t = xc;
  }
 ​
  t.reverse();
  for (; a--;) t.push(0);
  t.reverse();
  }
 ​
  // Place xC in a longer array for subsequent loop addition operations
  if (xc.length - yc.length < 0) {
  t = yc;
  yc = xc;
  xc = t;
  }
 ​
  a = yc.length;
 ​
  // Perform the addition operation to save the values to XC
  for (b = 0; a; xc[a] %= 10) b = (xc[--a] = xc[a] + yc[a] + b) / 10 | 0;
 ​
  // No need to check 0, because +x + +y! Is equal to 0, and then minus x plus minus y factorial. = 0if (b) {
  xc.unshift(b);
  ++ye;
  }
 ​
  // Delete the trailing 0
  for (a = xc.length; xc[--a] === 0;) xc.pop();
 ​
  y.c = xc;
  y.e = ye;
 ​
  return y;
 };

 P.times = P.mul = function (y) {
  var c,
  x = this,
  Big = x.constructor,
  xc = x.c,
  yc = (y = new Big(y)).c,
  a = xc.length,
  b = yc.length,
  i = x.e,
  j = y.e;
 ​
  // symbol comparison determines whether the final sign is positive or negative
  y.s = x.s == y.s ? 1 : -1;
 ​
  // If one of the values is 0, return 0
  if(! xc[0) | |! yc[0]) return new Big(y.s * 0);
 ​
  // The decimal point is initialized to x.e+ Y.e. This is how we calculate the decimal point when multiplying two decimals
  y.e = i + j;
 ​
  // This step also ensures that the length of xc is never less than the length of yc, since xc is traversed to perform the operation
  if (a < b) {
  c = xc;
  xc = yc;
  yc = c;
  j = a;
  a = b;
  b = j;
  }
 ​
  // Initialize the result array with 0
  for (c = new Array(j = a + b); j--;) c[j] = 0;
 ​
  // I is initialized to the length of xc
  for (i = b; i--;) {
  b = 0;
 ​
  // A is the length of yc
  for (j = a + i; j > i;) {
 ​
  // Multiply one bit of xc by one bit of yc to get the final value, save it
  b = c[j] + yc[i] * xc[j - i - 1] + b;
  c[j--] = b % 10;
 ​
  b = b / 10 | 0;
  }
 ​
  c[j] = b;
  }
 ​
  // If there is a carry, then adjust the number of decimal places (increase y.e), otherwise delete the first 0
  if (b) ++y.e;
  else c.shift();
 ​
  // delete the following 0
  for(i = c.length; ! c[--i];) c.pop(); y.c = c;return y;
 };

 P.round = function (dp, rm) {
  if (dp === UNDEFINED) dp = 0;
  else if(dp ! == ~~dp || dp < -MAX_DP || dp > MAX_DP) {throw Error(INVALID_DP);
  }
  return round(new this.constructor(this), dp + this.e + 1, rm);
 };
 ​
 function round(x, sd, rm, more) {
  var xc = x.c;
 ​
  if (rm === UNDEFINED) rm = Big.RM;
  if(rm ! = =0&& rm ! = =1&& rm ! = =2&& rm ! = =3) {
  throw Error(INVALID_RM);
  }
 ​
  if (sd < 1) {
  // In the case of bottom pockets, the precision is less than 1, and the default valid value is 1
  more =
  rm === 3&& (more || !! xc[0]) || sd === 0 && (
  rm === 1 && xc[0] > =5 ||
  rm === 2 && (xc[0] > 5 || xc[0= = =5 && (more || xc[1] !== UNDEFINED))
  );
 ​
  xc.length = 1;
 ​
  if (more) {
 ​
  // 1, 0.1, 0.01, 0.001, 0.0001, etc
  x.e = x.e - sd + 1;
  xc[0] = 1;
  } else {
  // Define 0
  xc[0] = x.e = 0; }}else if (sd < xc.length) {
 ​
  // In the xC array, the paper after the precision is discarded and rounded
  more =
  rm === 1 && xc[sd] >= 5 ||
  rm === 2 && (xc[sd] > 5 || xc[sd] === 5 &&
  (more || xc[sd + 1] !== UNDEFINED || xc[sd - 1] & 1)) ||
  rm === 3&& (more || !! xc[0]);
 ​
  // Delete the array value with the desired precision
  xc.length = sd--;
 ​
  // Determine the integer
  if (more) {
 ​
  // Rounding may mean that the previous number must be rounded, so 0 is required
  for (; ++xc[sd] > 9;) {
  xc[sd] = 0;
  if(! sd--) { ++x.e; xc.unshift(1); }}}// Delete the 0 after the decimal point
  for(sd = xc.length; ! xc[--sd];) xc.pop(); }return x;
 }
Copy the code

In normal logic, we abandon the value after the precision according to the precision, and uniformly fill 0 for representation.

From the implementation of the internal round function, we can see that at the very beginning, we carried out the exception of the bottom of the pocket detection, excluding two abnormal cases. One is a parameter error, directly throw an exception; The other is the case where the precision is less than 1, where the bottom of the pocket is defined as 1.

In big.js, all round operations call an internal round function. So, let’s take the round method in the API as an example. This method takes two parameters. The first value dp represents the number of digits that are valid after decimals, and the second rm represents the way to round.

References:

  • Binary representation of floating point numbers (IEEE 754 standard)
  • IEEE 754 – Wikipedia, the free encyclopedia
  • JS floating point number precision problem – digging gold
  • Segmentfault.com/a/119000001…