The representation of floating point numbers

Like the scientific notation of the decimal system, binary floating-point numbers are represented by x×2yx \times 2^yx×2y, as illustrated by a concrete example.

example

Let’s say eight bits are used to represent a floating point number. One bit is a sign bit, four bits are order bits, and three bits are mantissa bits. Thus the Bias is 23−1=72^3-1=723−1=7, (3 is obtained by order code point -1). Before I look at this example, I want to introduce three kinds of numbers

  • Normalized numbers: all numbers with order code point 0 are normalized numbers. In the normalized numbers, the mantissa is the small value f represented by the mantissa field, and the order code point 1-bias. This setting is to realize smooth transition from the normalized number to the normalized number. Nonnormalized numbers also provide a way to represent 0, and values that are very close to 0.
  • Normalized numbers: All numbers whose order is neither 0 nor a maximum (255 in this case) are normalized numbers. In the normalized number, the order code point is e-bias, where e is the value represented by the order code field and the mantissa is 1+f, so the range of mantissa is 1⩽M⩽21\leqslant M \leqslant 21 ⩽M⩽2
  • Special value: This Number is special when the order codes are all ones, where the Number represents infinity when the mantissa fields are all zeros, and NaN (Not a Number) when the mantissa fields are all zeros.

Here is an example (source: Understanding Computer Systems)

Ps: When calculating a binary decimal such as 0.1120.11_20.112, the algorithm takes the maximum number of digits to the right of the decimal point +1 as the denominator, the actual value as the numerator, and the integer value to the left of the decimal point as the floating point value.