System.out.println(1.0f – 0.9f) 3… 2… 1…

Answer: 0.100000024

If you’re curious why the answer isn’t 0.1, go to ⬇️

How do you describe floating point numbers in computer science?

As we all know, in mathematics, scientific enumeration is used to approximate a large or small number of digits. Scientific notation looks like this:

A 10 x ^ n. One ≦ | a | ≦ 10, known as a valid number

The mapping from scientific numeration in the mathematical world to floating point numbers in the computer world takes a slightly different form, taking into account the implementation of memory hardware devices and the change in the number system (from decimal to binary).

Among them, the decimal exponent is changed into the “order code”, the significant number is changed into the “mantissa”, and the special sign bit under the computer binary number system, constitute the three elements of computer science numeration:

  • The sign bit
  • Exponent bits
  • Mantissa bits

It may not be intuitive to say so, but let’s take a single precision floating point number as an example. It takes 4 bytes, a total of 32 bits, and the three elements are shown as follows:

Let’s take one at a time:

  1. The sign bit

    The highest bit is 0 for positive and 1 for negative.

  2. Exponent bits

    The 8 bits to the right of the sign bit are used to represent the exponent. To be clear, in IEEE754 standard, which is the mainstream of the computer world, the order code point stores the code shift corresponding to the exponent.

    According to baidu Baike’s definition of code shift:

    Shifter (also called increment code) is the complement of the inverse sign bit. Generally, the exponential shifter minus 1 is used to do the order code of floating point numbers. The purpose of the introduction is to ensure that the machine zero of floating point numbers is all 0.

    [X] shift = X + 2^n-1(n is the number of bits of X, including the sign position, in the representation of order code, n = 8)

  3. Mantissa bits

    As mentioned above, the mantissa bits represent the significant numbers of floating point numbers. A mantissa in line with the normalization of the highest position must be 1(you taste, you fine taste…) , so to save storage space, the highest bit 1 is omitted. Therefore, the mantissa bits that really occupy the mantissa bits are 24 bits, which represent 23 bits.

After introducing the three elements, let’s take a few simple examples to illustrate the above points. For example, the decimal number “8.0” is represented in the computer world:

Here, you can easily turn a corner:

What about the decimal number “0.9”?

Answer:

This example is just to show you:

Some floating point numbers cannot be accurately represented in finite binary scientific notation.

The addition and subtraction of floating point numbers

How did we add and subtract decimals when we were in primary school?

1. To calculate the addition and subtraction of decimals, first align the decimal points of each number (that is, align the numbers in the same place).

2 in accordance with the rule of integer addition and subtraction, the final number in the line on the decimal point point on the decimal point.

As you can see from the above, the most important part of adding and subtracting decimals is the alignment of the decimal points. Similarly, for the addition and subtraction of floating point numbers, “alignment of the decimal point” is also important. To map a floating-point calculation to a scientific notation, you want to make sure that the exponentials are the same. The technical term for this is called a counterorder operation:

△ E= ex-ey, and then △ E is added to the smaller order code to make it equal to the larger order code. At the same time, the montisse of the corresponding small order code is shifted to the right by corresponding digits to ensure that the value of the floating point number remains unchanged.

  • The principle of order pair is small order pair big order. The reason for this is that if the large order pair small order pair, the high part of the numerical part of mantissa needs to be moved out, while the small order pair big order pair is the low part of the numerical part of mantissa, so the loss of accuracy is smaller.
  • △ E=0 △ E=0 △ E=0

Before performing a pair operation, we first check whether any of the two numbers involved in the operation have a value of 0. Because floating-point arithmetic is so complicated, Google’s Performance Tips also has a tip:

Avoid using floating-point

When one of the numbers is 0, the other value involved in the calculation is directly returned as the result.

After the completion of the order operation, the mantail number is calculated accordingly (the addition is directly summed, and if the number is negative, it is converted to complement code first and then summed), similar to the decimal operation.

If the results obtained by the above steps are still satisfied

A x 2 ^ n. One ≦ | a | ≦ 2 if you don’t need to handle, if not satisfied, you need to move the tail of the digits (left or right) to make it meet the form, this step will also lose precision, this step is called a result normalization, tail moves to the right is the right rules, the left is called the left gauge.

In order to make up for the loss of precision in the process of order operation and result normalization, the removed part of data is saved, which is called the guard bit. Rounding is performed according to the guard bit after the result normalization.

To sum up, it is roughly the following process:

With the above concepts in mind, let’s move on to the question raised by the following title:

1-0.9 ≠ 0.1, why?

Why is 1 minus 0.9 not equal to 0.1?

First of all, let’s be clear that subtraction in computers is often translated into addition. For example, 1.0-0.9 is equivalent to 1.0 + (-0.9).

Let’s first write the binary code for 1.0 and -0.9:

As mentioned above, the highest mantissa bit hides a 1, so the actual mantissa of 1.0 is:

1000-0000-0000-0000-0000-0000

The actual mantissa of -0.9 is:

1110-0110-0110-0110-0110-0110-0110

Next we will check the zero value -> order operation -> mantail sum -> result normalize -> result round

Zero detection

Obviously, neither number is of size 0, so skip this step.

For order operation

The order code of 1.0 is 127, and the order code of -0.9 is 126. Through comparison, we can find that the mantail’s complement of -0.9 needs to move to the right, and the high-order complement is 1, so that the order code becomes 127 and achieves “the effect of decimal point alignment”. The mantail’s complement after -0.9 is moved is:

1000-1100-1100-1100-1101

Mantissa sum

Convert the mantissa of 1.0 and -0.9 into complement, and then add them by bit (after the completion of the order operation, the order code point will not participate in the operation, only the mantissa bit and the sign bit will participate in the operation) :

The operation result of mantissa bits is:

0000-1100-1100-1100-1101

The results are normalized

The operation after summing the mantissa is not satisfactory (the highest bit of the mantissa must be 1, please reflect on why), so here we need to shift the result to the left by 4 bits and subtract 4 from the order code to normalize the result.

After such a step, the order code is 123 (the corresponding binary is 1111011), and the mantissa is

1100-1100-1100-1100-1101-0000

Then hide the highest bit of its mantissa, and become:

100-1100-1100-1100-1100-1101-0000

The final result

Finally, the calculation result of 1.0-0.9 is:

Finally, we get a sign bit of 0, order code of 01111011, mantissue-bit of 100-1100-1100-1100-1101-0000, corresponding to the decimal representation of 0.100000024.

The adverse consequences of loss of accuracy

Since there is a loss of precision when using floating-point numbers, what is the impact on our daily development?

Using floating-point size judgments to control some business processes often results in unpredictable behavior.

Imagine an e-commerce App where submitting an order requires the front end to verify that the user’s balance is sufficient to pay for the order, and if not, disable the submit order button.

Without understanding the loss of accuracy, it is easy to write judgments like this:

btSubmit.setEnabled(balance >= orderAmount)
Copy the code

If the balance or order amount loses accuracy at this point, it may occur that the user’s balance is sufficient to pay for the order, but the button to submit the order is disabled due to a false judgment on the front end, causing the user to fail to submit the order (don’t ask me how I know such a detailed scenario, then ask suicide).

How to avoid the adverse consequences of precision loss?

Avoid unnecessary use of floating point numbers.

Using floating-point numbers comes with the following headaches: performance loss and precision loss. In general, floating-point numbers are about two times slower than integer numbers on Android devices. Therefore, it is more recommended to use an integer instead of a floating point type if the argument is not inevitable.

When floating point numbers are used, use double instead of float.

First of all, in terms of speed, float and double are not any different on current hardware, and in terms of time and space decisions, I believe most people are more inclined to use space for time. At the same time, the accuracy of double precision floating point is much higher than that of single precision floating point because of the larger storage space.

For example, in the example in the previous part, we can completely use the people’s percentage as the unit, and convert the balance and order amount into integer comparison in units, so as to avoid the unpredictable adverse consequences caused by the loss of accuracy.

conclusion

To sum up, we can not avoid the loss of precision of floating point numbers in the computer, but we can reasonably avoid the adverse consequences caused by the loss of precision, such as: to avoid using the size of two floating point numbers as a judgment basis when controlling the business process.