Inverse of Fast Square Root

YouTube analysis: www.youtube.com/watch?v=p8u…

Wikipedia analysis: en.wikipedia.org/wiki/Fast_i…

IEEE 754’s float representation consists of three parts:

Sign(1bit) is definitely 0 here
So the Exponent is 8bit, E
Mantissa(23bit), short for M

So the floating-point representation is 1 + M/ 2^23 times 2^ E-127.

The explanation in the video is very clear, and the general idea includes the following:

Use log2 to separate out the (E-127) portion
Log2 (1 + x) ~ = (x + u), which u involves the choice of the magic number, then the log (1 + M / 2 ^ (23)) ~ = M / 2 ^ (23) + u
Convert to integer representation (E + M/(2^23)) <<23 = (E<<23) + M. That’s the integer representation of float
Newton’s iteration of f(x) = 0 is x = x0 – f(x0)/f'(x0). So f of x is equal to 1 over x squared minus number is equal to 0

float Q_rsqrt( float number )
{
    long i;
    float x2, y;
    const float threehalfs = 1.5 F;

    x2 = number * 0.5 F;
    y  = number;
    i  = * ( long * ) &y;                       // evil floating point bit level hacking
    i  = 0x5f3759df - ( i >> 1 );               // what the fuck?
    y  = * ( float * ) &i;
    y  = y * ( threehalfs - ( x2 * y * y ) );   // 1st iteration
// y = y * ( threehalfs - ( x2 * y * y ) ); // 2nd iteration, this can be removed

    return y;
}
Copy the code

But if you go all the way around, you get this thing right here.

Here u is the estimate of log of 1 plus x minus x. If u=0 is assumed, the calculated floating-point number is as follows, with an M error of about 6%

In [56]: a = (1<< 23) * 1.5  * (127)

In [57]: b = 0x5f3759df

In [60]: a, b, a-b, (a-b) / (1 << 23)
Out[60] : (1598029824.0.1597463007.566817.0.0.06756985187530518)
Copy the code

We can do an experiment to see if the error in M gets better under different u, and what is the approximate range of U on average

In [11] :def test() :
    ...:     data = []
    ...:     importmath ... :for x2 in range(100) :... : x = x2 *0.01. : u = math.log2(1+ x) - x ... : a = (1 << 23) * 1.5 * (127- u) ... : b =0x5f3759df. : r =abs(a - b) / (1 << 23)
    ...:         data.append((round(x, 4), round(u, 4), round(r, 4)))
    ...:     data.sort(key = lambda x: x[2])
    ...:     avgu = sum((x[1] for x in data)) / len(data) ... :print('====top10=====')
    ...:     for x in data[:10] :... :print(x) ... : value =1.5 * (1 << 23) * (127- avgu) ... : number =hex(int(value)) ... :print('avg u = %.4f, number = %s' % (avgu, number))
Copy the code

The result is as follows. It can be seen that the value of U is approximately in the range of 0.0439-0.046, and the average value of U is 0.0573. The corresponding number is 0x5F34FF97. For the magic number 0x5F3759df, the corresponding U is 0.0450466.

= = = = top 10 = = = = = (0.81, 0.046, 0.0014) (0.82, 0.0439, 0.0017) (0.13, 0.0463, 0.0019) (0.12, 0.0435, 0.0023) (0.8, 0.048, 0.0044) (0.83, 0.0418, 0.0048) (0.14, 0.049, 0.006) (0.11, 0.0406, 0.0067) (0.79, 0.05, 0.0074) (0.84, 0.0397, Avg u = 0.0573, number = 0x5f34FF97Copy the code

We could have written an implementation of SQRT (x) in the same way, but in Newtonian iterations, there’s a division calculation, and you have to iterate twice to get a more accurate result.

float Q_sqrt( float number )
{
    long i;
    float y;

    // X = int(127-u) * (1 << 22)
    / / u = 0.0573
    #define X 0x1fbc5532
    / / u = 0.045
    // #define X 0x1fbd1df4
    y  = number;
    i  = * ( long * ) &y;
    i = (i >> 1) + X;
    // to avoid negative value.
    // i = i & 0x7fffffff;
    y  = * ( float * ) &i;
    y = 0.5 f * (y + number / y);
    y = 0.5 f * (y + number / y);
    return y;
}

int main(a) {
    for(int i=10; i<=1200; i+=20) {
        float x = Q_sqrt(i);
        cout << "i = " << i << ", x=sqrt(i) = " << x << ", x*x= " << x * x << endl;
    }
    return 0;
}
Copy the code

Related Posts

Java enterprise station source adaptive responsive compatible mobile phone tablet PC SSM Freemaker static engine mainstream framework

Dubbo – Timing task sharp tool time wheel

JAVA concurrency (4) atomic operation implementation principle