In yesterday’s post, we talked about the RSA algorithm. In the fundamental principle of THE RSA algorithm, there are two core prime numbers p and q, which are multiplied to give a number n. Since it is very difficult to decompose p and q from N in reverse, the RSA algorithm is unbreakable at today’s computer level as long as p and q are large enough.
Now, you pause for a moment, open Baidu or Google, search the RSA algorithm tutorial. Read any 10.
You’ll notice that these tutorials invariably say: Find two sufficiently large primes p and Q. But they won’t tell you how to find it.
In the present system of mathematics, primes are found, not generated. There is no perfect general-term formula for generating primes. We can do a quick check to see if a number is prime, but we can’t generate a prime number right now.
So the question is, how do the two primes that are needed to generate the key in RSA come from?
When we use the RSA algorithm to generate a 2048 bit key, we need to find two prime numbers p and q, which are each 1024bit. How big is a 1024bit number? Its minimum value is zeroAnd the maximum is. If you count from the smallest number here, to the largest number here, you can count 100 million numbers per second, you need to count570044753571256946895391042233962688235025678254156066950247593726955466151385601004275993538836681954338260654082297557264046704764131857219835840434659197037569423594829671728507799344387665269701556798848952843855120124119935570376436804099528276139492994306780499238797710357939232321
It takes ten thousand years to count.
In this big range of numbers, I want you to find two primes. You say, this how the fuck to find?
So, what fairy algorithm does Python’s RSA library use to quickly find these two primes? So I went and read the source code. It scared me into a cold sweat.
The key is generated using the rsa.newkeys() function, so I first found this function in the rsa/key.py file:
Starting with lines 758-762, it uses the poolsize parameter to determine how many cores of the CPU to use. If I have a 4-core CPU, I can run four processes simultaneously to find primes. But we can skip this code because in yesterday’s article, we didn’t specify the poolsize parameter, so it uses the default value of 1. So the code runs to line 767, generating p and q from the gen_keys function.
Let’s look at the gen_keys function again:
As you can see, at line 714, p and q are generated by the function find_p_q, and here both p and q are 1024bit if our key is 2048bit.
Let’s look at the find_p_q function again:
This is a long function, but mostly it verifies that the generated p and q are correct (not equal, and sufficiently different), and if they are not, try again. So the real core code is lines 613 and 615. The genprime_func function called here is passed in as an argument. This genprime_func is the rsa.prime. Getprime function we get on line 764 of newKeys.
Now go to the /rsa/prime.py file and read the source code for the getPrime function:
This code is surprisingly simple. In line 162, determine that the number of bits to be generated is not less than 3.
while True:
integer = rsa.randnum.read_random_odd_int(nbits)
# Test for primeness
if is_prime(integer):
return integer
Copy the code
Open an infinite loop and call read_random_odd_int to retrieve the odd number of nbit. Then, use is_Prime to determine if it is prime. If it is, return the number. If it is not prime, then generate an odd number of nbit randomly and determine whether it is prime or not.
Are you fucking kidding me? I randomly generate an odd number in an infinite loop, and then I decide if it’s prime, and then I try again until I get to a random prime number, right?
intoI’m going to randomly pick an odd number out of this range, right? How many years does it take to hit two primes?
To solve this puzzle, let’s look at the prime number theorem.
For positive real numbers, π(x) is defined as the prime counting function, that is, the number of prime numbers not greater than x. Mathematicians have found some functions to estimate the growth of PI (x) :
inIf it is large enough, you can use this company to estimate that it is not larger thanThe number of primes of theta.
So let’s see, in thetoWhat is the density of primes in the range of:
The density of primes is a whopping 0.14%! So the chance that a random number is not prime is 99.86%. Let’s calculate that if 10,000 numbers were randomly selected, even without considering parity:
In other words, out of 10,000 random numbers, the chance that no prime number will appear is one in 10 million. The probability of a prime number appearing is more than 99.9999%
It doesn’t take that long to loop 10,000 times in Python. So, the ALGORITHM in the RSA library, no problem!!
Finally, you might want to check out the is_prime function in prim.py, which is used to quickly determine whether a number is prime or not. And read_random_odd_int in randnum.py to randomly generate a count. The code is simple, and you can learn a lot.