C++ floating-point precision determination

The article directories

- A, quotes
- Floating-point representation
- - 1, the IEEE 754
  - 2, single precision and double precision
  - 3. Single-precision floating-point representation
  - 4. Give examples
  - 5. Code testing
- Three, floating point number judgment
- - 1. Precision definition
  - 2. Determination of equality
  - 3. Determination of inequality
  - 4, greater than or equal to judgment
  - 5, less than or equal to judgment
  - 6, less than judgment
  - 7, greater than the judgment

A, quotes

If you look at the following code, what does this output look like?

	double x = 0;
	for (int i = 0; i < 10; ++i) {
		x += 0.1;
	}
	printf("%d\n", x == 1);
Copy the code

The output is as follows:

0

The reason for this contrast is floating point error. Floating point number has precision error when it is stored, so we cannot use ‘==’ when determining floating point number is equal to. Then we will look at the expression of floating point number;

Floating-point representation

1, the IEEE 754

IEEE Binary Floating-point arithmetic Standard (IEEE 754) is the most widely used floating-point arithmetic standard since the 1980s. It is adopted by many cpus and floating-point arithmetic machines. This standard defines the format for representing floating-point numbers (including negative zero-0 and anomalous values), special values (Inf and NaN), and “floating-point operators” for those values.
Based on this specification, any floating point number can be expressed as follows: (Value) 2 = S I g * * Fract I on ∗ e xponent (Value)_2 = Sign * Fraction * {Exponent} (Value) 2 = Sign ∗ Fraction ∗ Exponent

Sign represents the Sign bit, representing positive or negative numbers; Faction represents the mantissa, which must start with 1 in scientific notation, for example, 1.010100111; So Exponent stands for Exponent, 1001 is actually 2, 9, 2^9, 29;

2, single precision and double precision

F L o A T float float doub L e double double
The mantissa of single precision is 23 bits, and the exponent is 8 bits. The mantissa of double precision is 52 bits, and the exponent is 11 bits.
Since the structure is the same, only single-precision floating-point numbers are introduced here;

3. Single-precision floating-point representation

The binary representation of a 32-bit floating-point number is shown, with a total of 32 bits (0 or 1).
1) S is in the highest digit, which represents the symbol. The negative number is 1 and the positive number is 0.
2) E is the 30th to 23rd bit from the highest to the lowest, with a total of 8 bits. It represents the exponential offset and stores the value after 127 is added (the reason is that the exponential may be negative);
3) F represents the mantissa of binary scientific notation, with a total of 23 digits. Since it is scientific notation, it must start with “1.”, so the significant mantissa number is 24 digits.

4. Give examples

[Example] Find the binary storage format of the floating point number − 18.375-18.375 −18.375.

1) First remove the symbol, and finally at the highest position of the binary representation 1, then the actual binary representation 18.375;
1=(10010)2 18=16+2=2 ^4 +2 ^1 =(10010) _2 18=16+2=24+21=(10010)2
3) The binary representation of the decimal part 0.375 is: 0.375 = 0.125 + 0.25 = 2-3 + 2-2 = (. 011) = 0.125 + 0.25 = 0.375 2 ^ 2 ^ {3} + {2} = (. 011) _2 0.375 = 0.125 + 0.25 = 2-2-3 + 2 = 2 (. 011)
(18.375)10=(10010.011)2 (18.375)_{10} =(10010.011) _2 (18.375)10=(10010.011)2
5) Expressed as scientific enumeration method to obtain: 10 = (18.375) (1.0010011) 2 ∗ (100) 2 (18.375) _ {10} = (1.0010011) _2 * (100) _2 (18.375) = 10 (1.0010011) 2 ∗ (100) 2
6) Then get the mantissa F=0010011 F=0010011 F=0010011 (discard the “1.” in the scientific counting method), then use 0 to complete; Add 127 to the order, E=(100+1111111)2=(10000011)2 E=(100+1111111) _2 =(10000011) _2 E=(100+1111111)2=(10000011)2; Sign bit S=1 S=1 S=1;

S: 1

E: 10000011

F: 00100110000000000000000

7) Fill into binary to get:

SEF = 11000001 10010011 00000000 00000000

5. Code testing

C/C++ can use the following code to take the address of a floating point number into an integer output after the binary representation of the integer;

	float a = 18.375;
	unsigned int v = *((unsigned int *)&a);
Copy the code

Three, floating point number judgment

1. Precision definition

In C++, 1e−6 1e-6 1e−6 is 10−6 10^{-6} 10−6, 0.000001 0.000001 0.000001.

#define eps 1e-6
Copy the code

2. Determination of equality

The representation of floating-point number cannot be determined by ‘==’. The two numbers must be subtracted and the absolute value taken to determine whether they are equal according to whether the result is less than a certain precision.

bool EQ(double a, double b) {   // EQual
	return fabs(a - b) < eps;
}
Copy the code

3. Determination of inequality

Unequal is the non of equal;

bool NEQ(double a, double b) {  // NotEQual
	return !EQ(a, b);
}
Copy the code

4, greater than or equal to judgment

‘greater than or equal to’ means’ greater than or equal to ‘and needs to be broken down into the following forms:

bool GET(double a, double b) {    // GreaterEqualThan
	return a > b || EQ(a, b);
}
Copy the code

5, less than or equal to judgment

‘Less than or equal to’ means’ less than or equal to ‘and needs to be broken down into the following forms:

bool SET(double a, double b) {   // SmallerEqualThan
	return a < b || EQ(a, b);
}
Copy the code

6, less than judgment

“Less than” is the “not” of “greater than or equal to”, which needs to be broken down into the following form:
A

bool ST(double a, double b) {   // SmallerThan
	return a < b && NEQ(a, b);
}
Copy the code

7, greater than the judgment

‘greater than’ is the ‘not’ of ‘less than or equal to’, which needs to be broken down into the following form:
A >b A > B A > B

bool GT(double a, double b) {   // GreaterThan
	return a > b && NEQ(a, b);
}
Copy the code

C++ floating-point precision determination

The article directories

A, quotes

1, the IEEE 754

3. Single-precision floating-point representation

5. Code testing

1. Precision definition

3. Determination of inequality

5, less than or equal to judgment

7, greater than the judgment

Related Posts