The introduction

How would you write the following formula in code?


f ( x ) = 1 x f(x) = \frac{1}{\sqrt{x}}

If it were me, one line of code

float y = 1 / sqrt(x);
Copy the code

Quake 3 did it in an interesting way:

float Q_rsqrt( float number )
{
  long i;
  float x2, y;
  const float threehalfs = 1.5 F;
  
  x2 = number * 0.5 F;
  y = number;
  i = * ( long * ) &y;     									 // evil floating point bit level hacking
  i = 0x5f3759df - (i >> 1);                 // what the fuck?
  y = * ( float * ) &i;
  y = y * (threehalfs - ( x2 * y * y ) );    // 1st iteration
  y = y * (threehalfs - ( x2 * y * y ) );    // 2nd iteration, this can be removed
  
  return y;
}
Copy the code

If you look at the 0x5F3759df number, you might be a little confused, and the three questions of life begin:

  1. Who is it?
  2. Where did it come from?
  3. What does it do?

Here is where the Magic number comes from and where it goes.

Why do we need this algorithm?

When you need to implement some physical effect in your game, such as light and shadow, reflection, you need to unit the vector, so you need to calculate the square root of the vector. If the vector is too long or too short, you get a miscalculation.Assume that the incident point is A, the origin is O, and the normal vector of the reflection plane is N. Calculate the reflection point B, according to the reflection point formula:


O B = O A + 2 N d o t ( N . O A ) d o t ( N . O A ) = O A c o s Theta. c o s Theta. = O A N O A N \vec{OB} = -\vec{OA} + 2 * \vec{N} * dot(N, OA) \\ dot(\vec{N}, \vec{OA}) = |OA| * cos \theta \\ cos \theta = \frac{\vec{OA} * \vec{N}}{|OA| * |N|}

Dot (N⃗,OA⃗)dot(\vec{N}, \vec{OA})dot(N,OA) denotes the projection of OA⃗\vec{OA}OA normal vector N⃗\vec NN.

And the formula looks something like this


O B = O A + 2 N ( O A N ) \vec{OB} = -\vec{OA} + 2 * \vec{N} * (\vec{OA} * \vec N)

Vector multiplication does not have the law of association, so OA∗NOA * NOA∗N must be calculated first, the result is a number, then multiplied by the unit normal vector to become a vector, and finally the vector of the reflection point can be calculated.

So one of the things that’s really important in computing is to figure out the unit normal vector, and when you’re actually running code you need to figure out the unit normal vector in terms of some vector, x,y,z (x,y,z)(x, y,z)


x 0 = x 1 x 2 + y 2 + z 2 y 0 = y 1 x 2 + y 2 + z 2 z 0 = z 1 x 2 + y 2 + z 2 x_0 = x * \frac{1}{\sqrt{x^2 + y^2 + z^ 2}} \\ y_0 = y * \frac{1}{\sqrt{x^2 + y^2 + z^ 2}} \\ z_0 = z * \frac{1}{\sqrt{x^2 + y^2 + z^ 2}} \\

As you can see, calculating the inverse of the square root of a number is very frequent, so you need a very fast algorithm to do that. As you know, multiplication and addition are designed to be very fast in computers, so x*x + y*y + z*z is really fast to compute, but SQRT (x*x + y*y + z*z) is slow to take the square root of a number, and it’s slow to take the reciprocal of a number, which is called division, So the above line of code takes a very slow time to implement the inverse of square root. Q_rsqrt(x*x + y*y + z*z) inside the code does not see any division, and take the square root, there are all multiplication, displacement and other fast operation, square root reciprocal speed calculation is actually calculated approximate number, about 1% error, but the operation speed is three times as fast as before, the following will explain these lines of code.

SQRT {3}3 = SQRT {3}3

Initialization parameter

long i;
float x2, y;
const float threehalfs = 1.5 F;

x2 = number * 0.5 F;
y = number;
Copy the code

The function takes a 32-bit floating-point number as an argument, then declares a 32-bit integer variable I, continues with two 32-bit floating-point numbers x2,y, and a 32-bit floating-point constant threehalfs for 32\frac{3}{2}23. The next two lines are also very simple. One is to assign half of the number to X2 and half of the number to y, and one of the interesting things you’ll notice is that these variables, whether they’re integers or floating-point, have 32 bits in memory, and that’s actually the basis for magic.

The next few comments (because you can’t read the code directly) are “Evil Bit Hack”, “What the Fuck”, and “Newton Iteration”. These three comments illustrate three important steps of the algorithm. Before explaining these steps, Let’s start with the in-memory representation of floating-point numbers.

Representation of binary numbers

In everyday life, we use decimal notation to represent real numbers, which are called true values. Negative numbers are preceded by a ‘-‘ sign to indicate that the number is less than 0, and a ‘+’ sign (usually not written) to indicate a positive number. And in the computer can only represent 0, 1 two states, so the next plus and minus in the computer will be represented in the form of 0, 1, usually placed in the highest bit, as a sign bit. In order to facilitate the arithmetic operation of these machine numbers and improve the speed of operation, the computer has designed many kinds of representation, among which the more common ones are: original code, inverse code, complement code and shift code. The following uses the 8-bit number 0000 0100, namely the decimal number 4, to introduce these representations

  • The original code

The representation of original code is the easiest to understand. First, the first bit is the sign bit, and the following seven bits represent the truth value. If negative numbers are represented, the symbol position only needs to be 1, and the following seven bits are still the truth value, so the original code of 4 and -4 is

Therefore, it is easy to conclude that the representation range of the source code is [-127, 127], and there will be two special numbers +0 and -0

  • Radix-minus-one complement

The inverse of a positive number is its original code, while the inverse of a negative number is that all other bits are reversed on the basis of reserving the sign bits, so the inverse of 4 and -4 is

So the inverse code represents the range [-127, 127], and there are still two special numbers +0 and -0

  • complement

The complement of a positive number is its original code, while the complement of a negative number is to retain the sign bit, invert all other bits, and finally add 1, that is, +1 on the basis of the inverse code, so the complement of 4 and -4 is

So the complement is represented in the range [-128,127], and the previous -0 is represented in the complement as -128, which can represent one more number.

  • frameshift

On the number line, the range represented by the code shift exactly corresponds to the positive movement of the true value on the number line by 2n2^n2n units, corresponding to the range of 8 bits from [0, 255] to [-128, 127]. It can be seen that the offset is 128, so the code shift of 4 and -4 is

One of the advantages of the shift is the convenience of comparison, you can directly carry out bitwise comparison.

Floating point Numbers

First consider the question, how would you represent 4.25 in 32-bit binary? It might look something like this:

The idea of putting this in ordinary decimal is actually quite common, but in the binary world, there are 1 sign bits, 15 integer bits, 16 decimal bits, and the total range of numbers represented is only [−215,215][-2^{15}, 2^{15}][−215,215], However, the range of long integers is [−231,231][-2^{31}, 2^{31}][−231,231], and the multiple of difference is 6w5. It can be seen that this method abandoned half of the digits for the decimal representation, so the loss is not worth the gain, so someone proposed the IEEE754 standard.

IEEE754

Before describing this standard, here is scientific notation, which is expressed in decimal notation as follows

& 12300 && 1.23 * 10 ^ {4} \ \ & 0.00123 & 1.23 * 10 ^ {3} \ \

Similarly, scientific notation can be applied to binary

& 11000 && 1.1 * 2 ^ {4} \ \ & 0.0101 & 1.01 * 2 ^ {2} \ \

Therefore, IEEE754 also uses the form of scientific notation, which divides the 32 digits into the following three parts

  • Sign Bit

First of all, the first bit is the sign bit, 0 is a positive number, 1 is a negative number, and in the square root calculation, negative numbers are obviously not involved, so the first bit must be 0.

  • Exponent

Second part in 8 bit index, can be said for is the range of [0, 255], but this can only be positive, so you need to add negative also came in, and the IEEE754 standard exponent said frameshift, this should be expressed as a way of moving code is in floating point comparisons, compare the size of the order code will become very simple, By bit comparison. However, there is a small difference from the normal shift, 0000, 0000 and 1111. 1111 is used to represent the normalized numbers and special numbers, so the offset changes from 128 to 127 and the range becomes [-127, 126].

For example, as we all know, the 8bit truth value of the number 4 is 0000 0100, plus the offset of 127 becomes 131, so the shift of 4 is 1000 0011.

  • Mantissa

[0,1−2−23][0, 1-2 ^{-23}][0,1−2−23]. By default, the first digit in scientific notation is 1. So the range becomes [1,2−2−23][1, 2-2 ^{-23}][1,2−2−23], and the 1 is the default, so that one of the 23 bits is not used to represent the 1, so that one of the 23 bits is used to represent a wider range of decimals.

Let’s change the number of machines in IEEE 754 standard to 9.625:

First, the corresponding binary number is changed to 1001.101, which should be expressed as 1.001101∗231.001101 * 2^{3}1.001101∗23 using the standard floating point number, so the sign segment is 0, the shift of the exponential part is 10000010, and the significant digit is 001101 after removing 1. So the final result is zero

IEEE754 has normalized numbers, NaN numbers, infinity numbers, +0 and -0. Obviously, none of these numbers will be used as input to this function, so this article will not discuss them.

Binary conversion

Earlier we saw how a floating-point number is represented as a machine number on a computer. Now we need to do some manipulation on this floating-point number to make it easier to deal with later.

Similarly, we take the number 9.625 as an example. First of all, we set the truth value of the order code as E, then we have


E = 1000 . 001 0 ( 2 ) = 13 0 ( 10 ) E = 1000,0010_{(2)} = 130_{(10)}

If the truth value of the remainder is M, then


M = 0011010000000000000000 0 ( 2 ) = 0.001101 2 23 ( 2 ) M = 00110100000000000000000 _ {} (2) = 0.001101 * 2 ^ {{23}} _ {} (2)

Now, instead of thinking of this as a floating-point number, this number is a 32-bit long integer, let’s call this number L, and that’s what it means to be a decimal number


L = 2 23 E + M L = 2^{23} * E + M

This number L is the unsigned integer represented by these 32 bits, and this number is going to be useful, so we’ll leave it there for the moment, and we’ll come back to it.

We then use a formula to represent the decimal number of the floating point F


F = ( 1 + M 2 23 ) 2 E 127 F = (1 + \frac{M}{2^{23}}) * 2^{E – 127}

The mantis is added by 1 because the first digit is removed in IEEE754, so this one needs to be added in the calculation. Then, the order code is subtracted by 127 because the offset is 127, and 127 is subtracted from the 8-bit truth value to represent the true value.

And then the interesting thing happens, we now take the logarithm of this number F


log 2 F = log 2 ( 1 + M 2 23 ) + E 127 \log_2 F = \log_2 (1 + \frac{M}{2^{23}}) + E – 127

In this formula, E ^ (-127) is easy to calculate, but the logarithm is a little bit tricky to calculate, so we can substitute an approximate function for the logarithm.

So let’s see what y equals log2(1+x)y equals log2(1+x)y equals log2(1+x)y equals xy equals x

From the picture, we can easily come to the following conclusion


lim x 0 + log 2 ( 1 + x ) = x lim x 1 log 2 ( 1 + x ) = x \lim_{x \to 0^+} \log_2 (1 + x) = x \\ \lim_{x \to 1^-} \log_2 (1 + x) = x

Then it is easy to find that in the range [0,1], log⁡2(x+1)\log_2(x +1) log2(x+1) is actually very close to XXX, so that we can take a correction coefficient μ\muμ at [0,1] to make the following formula true


log 2 ( x + 1 ) material x + mu \log_2 (x + 1) \approx x + \mu

So now we know how to simplify logarithms, so we can substitute this simplification into the floating-point representation above, and we get, right


log 2 F = log 2 ( 1 + M 2 23 ) + E 127 material ( M 2 23 + mu ) + E 127 = ( M 2 23 + E 2 23 2 23 ) + mu 127 = 1 2 23 ( M + E 2 23 ) + mu 127 \begin{aligned} \log_2 F &= \log_2 (1 + \frac{M}{2^{23}}) + E – 127 \\ & \approx (\frac{M}{2^{23}} + \mu) + E – 127 \\ & = (\frac{M}{2^{23}} + \frac{E * 2^{23}}{2^{23}}) + \mu – 127 \\ & = \frac{1}{2^{23}} (M + E * 2^{23}) + \mu – 127 \end{aligned}

M+E∗223M +E * 2 ^{23}M+E∗223 is the binary number L of the floating point F, which is then substituted into the reduction equation


log 2 F material L 2 23 + mu 127 \log_2 F \approx \frac{L}{2^{23}} + \mu – 127

To some extent, regardless of scaling and transformation, we can think of the binary representation of a floating point number L as the logarithmic form of its own F, i.e


log 2 F material L \log_2 F \approx L

trilogy

After a series of complex number manipulation operations, we can finally start our algorithm trilogy

evil bit hack

As we all know, each variable has its own address, and when the program runs it will get the value of the variable from that address, and then it will perform a series of computations, such as I and y in memory

We can shift long integers very nicely, so if I want to multiply or divide this number by or by 2n2^n2n, I can just shift it N bits to the left or to the right, but floating-point numbers obviously can’t do bitwise operations, and their binary representation is not designed for bitwise operations.

And now I’m going to come up with an idea, I’m going to convert float to int, and I’m going to do the bit operation, and I’ll do it like this

long i = (long) y
Copy the code

If y is 3.33, C will simply discard the mantissa and I will become 3 after the long integer is strong. Who is to lose so much precision? If we want to perform bitwise operations without moving a bit, it will be this code

i = * ( long * ) &y;
Copy the code

What this line of code does is first find the address of floating point y, which can be thought of as 0x3d6f3D79, which is actually a float *. C will fetch it as a floating point, and to make C think of it as a long integer, it will have to cast the address. Convert float * strong to long *.

This strong transfer process actually didn’t change anything in the memory, first of all, it doesn’t change 0 x3d6f3d791 this address, also didn’t change the address in the storage of data, and can be thought of as changing is the C language of “cognition”, is to originally IEEE754 standard to read the data in this address, but C now think this is the address of a long integer, Just read it as a long integer. So (long*)&y stands for 0x3D6F3D791 and that’s where the long integer is stored, and THEN I use the * operator to take the data out of that address and assign it to the long integer variable I.

In this way, we are actually avoiding the meaning of the number itself, and we are actually taking out the binary data completely through address transformation.

what the fuck

Well, we all know that moving one bit to the left and moving one bit to the right causes the original number to be either multiplied or divided by 2, for example


x 110 = 6 x < < 1 1100 = 12 x > > 1 11 = 3 \begin{aligned} x & & 110 &= 6 \\ x << 1 & & 1100 &= 12 \\ x >> 1 & & 11 &= 3 \\ \end{aligned}

Let’s figure out a way to invert the square root and do a simple transformation


1 x = x 1 2 \frac{1}{\sqrt x} = x^{-\frac{1}{2}}

And that’s what we’re going to end up with, and then we’re going to get closer and closer to that, and we’re going to get an approximation.

In the previous statement, we concluded that the binary representation of floating-point numbers is the logarithmic representation of the numbers themselves, and that if we want floating-point numbers to be stored in y, we have


y = 9.625 log 2 ( y ) material i = 01000001   00011010   00000000   00000000 Y = 9.625 \\ log_2 (y) \approx I = 01000001\00011010\00000000\00000000

So I actually stores the logarithm of y, and of course you have to do a bunch of transformations and scaling.

As I mentioned, you can just take the inverse of the square root of a number, so you can just take the logarithm of the inverse of the square root, and you get the following equation


log 2 ( 1 y ) = log 2 ( y 1 2 ) = 1 2 log 2 ( y ) log 2 ( y ) material i log 2 ( 1 y ) material ( i > > 1 ) \begin{aligned} \log_2 (\frac{1}{\sqrt y}) = \log_2 (y^{- \frac{1}{2}}) & = – \frac{1}{2} \log_2(y) \\ \log_2 (y) & \approx i \\ \log_2 (\frac{1}{\sqrt y}) & \approx -(i >> 1) \end{aligned}

So -(I >> 1) is used to calculate the result of log⁡2(1y)\log_2 (\frac{1}{\ SQRT y})log2(y 1).

-(I >> 1) is not exactly an approximation of log⁡2(1y)\log_2 (\frac{1}{\ SQRT y})log2(y 1). The formula is divided by 2232^{23}223. We have to add some margin of error.

γ \Gamma γ = the inverse of the square root of y


Γ = 1 y log 2 Γ = log 2 ( 1 y ) = 1 2 log 2 ( y ) \begin{aligned} \Gamma & = \frac{1}{\sqrt y} \\ \log_2 \Gamma & = \log_2 (\frac{1}{\sqrt y}) = -\frac{1}{2} \log_2 (y) \end{aligned}

And then we substitute that into the formula for the logarithm of the floating point number, and we get


log 2 Γ = 1 2 23 ( M Γ + E Γ 2 23 ) + mu 127 log 2 y = 1 2 23 ( M y + E y 2 23 ) + mu 127 1 2 23 ( M Γ + E Γ 2 23 ) + mu 127 = 1 2 ( 1 2 23 ( M y + E y 2 23 ) + mu 127 ) \begin{aligned} \log_2 \Gamma & = \frac{1}{2^{23}} (M_{\Gamma} + E_{\Gamma} * 2^{23}) + \mu – 127 \\ \log_2 y & = \frac{1}{2^{23}} (M_y + E_y * 2^{23}) + \mu – 127 \\ \frac{1}{2^{23}} (M_{\Gamma} + E_{\Gamma} * 2^{23}) + \mu – 127 & = -\frac{1}{2}(\frac{1}{2^{23}} (M_y + E_y * 2^{23}) + \mu – 127) \end{aligned}

And now I’m going to go through some transformation and get the following expression


( M Γ + E Γ 2 23 ) = 3 2 2 23 ( 127 mu ) 1 2 ( M y + E y 2 23 ) = 0 x 5 f 3759 d f ( i > > 1 ) \begin{aligned} (M_{\Gamma} + E_{\Gamma} * 2^{23}) & = \frac{3}{2}2^{23}(127 – \mu) -\frac{1}{2} (M_y + E_y * 2^{23}) \\ & = 0x5f3759df – (i >> 1) \end{aligned}

γ \Gamma γ = Gamma \muμ = Gamma \muμ = Gamma \muμ = Gamma \muμ = Gamma \muμ = Gamma \muμ = Gamma \muμ This calculation process tends to be purely mathematical, please refer to the paper FAST INVERSE SQUARE ROOT for the specific process.

The value of μ\muμ in the original algorithm is 0.0450465, and then calculate it


3 2 2 23 ( 127 mu ) = 3 2 2 23 ( 127 0.0450465 ) = 1597463007 2 ^ \ frac {3} {2} {23} (127 – \ mu) = \ frac {3} {2} 2 ^ {23} (127-0.0450465) = 1597463007

The hexadecimal of this number is 0x5F3759df, which is also the magin number mentioned above. Then we calculate the approximate value based on this number and compare it with the real 1y\frac{1}{\ SQRT y}y 1

As you can see from the above image, the approximate curve fits well with the original values within the range [1,100], which completes the first few important steps.

y = * ( float * ) &i;
Copy the code

So we go through the reverse step of the evial bit hack, which is to convert a long integer into a floating-point memory address, and then extract that floating-point number according to IEEE754, which is the approximation of \Gamma, and at this point the algorithm is almost done. However, there is still some error in this approximation value, which needs to be reduced through certain processing to get closer to the real value.

Newton Iteration

We have already got a good approximation value, but there is still some error, and Newton iteration method can make this approximation value closer to the real value, and further reduce the error.

Newton iterative method itself is to find a method to find the root of an equation, for example, now there is an equation F (x)= 0F (x)=0f(x)=0, need to find the root of this equation, but to solve the equation, is not able to solve the equation, so you can find an approximate value to replace the real solution.

As shown in the figure above, if f(x)=0f(x)=0f(x)=0 is the solution x= XCX = X_cx = xC, we need to first give an approximate value of XXX x0x_0x0, and through this x0x_0x0, continuously obtain a value closer to XXX.

In (x0, f (x0) (x_0, f (x_0) (x0, f (x0) do tangent, the tangent slope is f (x) f (x) = f (x) in x = x0x x_0x = x0 the derivative of f ‘(x0) f’ (x_0) f ‘(x0), And then I want to figure out where this tangent line intersects the X-axis, x1x_1x1, and I get x


x 1 = x 0 f ( x 0 ) f ( x 0 ) x_1 = x_0 – \frac{f(x_0)}{f'(x_0)}

So we have one iteration, and we can see from the picture that x1x_1x1 is closer to the real solution than x0x_0x0, and the next iteration based on x1x_1x1 does the same step, and we get a better approximation of x2x_2x2 than X1x_1x1, so Newton’s iterative formula is very simple


x n + 1 = x n f ( x n ) f ( x 0 ) x_{n + 1} = x_n – \frac{f(x_n)}{f'(x_0)}

As this number of iterations approaches infinity, xnx_nxn gets closer to the true solution.

And the last line is the simplified formula after one iteration, where did this formula come from. For a floating point number AAA, requiring the reciprocal of its bisector root, then


y = 1 a y = \frac{1}{\sqrt a}

I can form a function by this formula


f ( y ) = 1 y 2 a f(y) = \frac{1}{y^2} – a

To find the yyy value is to find the root of f(y)=0f(y)=0f(y)=0, so the iterative formula is


y n + 1 = y n f ( y n ) f ( y n ) = y n y n 2 a y n 3 = y n ( 3 2 1 2 a y n 2 ) \begin{aligned} y_{n+1} & = y_n – \frac{f(y_n)}{f'(y_n)} \\ & = y_n – \frac{y_n^{-2} – a}{y_n^{-3}} \\ & = y_n * (\frac{3}{2} – \frac{1}{2}* a * y_n^2) \end{aligned}

This formula corresponds to the code in the algorithm

y = y * (threehalfs - ( x2 * y * y ) )
Copy the code

At this point, your code is much closer to the real y=1xy=\frac{1}{\ SQRT x}y=x 1, and runs much faster while getting much closer to the real answer. Only sacrifice a little accuracy, but can improve the overall speed, which is actually an important point in the optimization of the algorithm.

Reference

0x5f3759df

FAST INVERSE SQUARE ROOT