In this series, we will explain the pre-Java 17 random number API and the post-Java 17 unified API in detail. We will also analyze the random number characteristics and implementation ideas to help you understand why there are so many random number algorithms, and what their design ideas are.

This series will be divided into two parts. The first part will describe the evolution of Java random number algorithm and the underlying principles and considerations, and then introduce the RANDOM algorithm API and test performance before Java 17. The second part will analyze the random number generator algorithm after Java 17, API and the underlying implementation classes and their attributes in detail. Performance and usage scenarios, how to choose random algorithms, etc., and the application of Java random numbers to some future features of Java

This is the first one.

How to generate random numbers

When we use the Pseudo Random Number Generator (PRNG), we generally think of the Pseudo Random Number Generator as a black box:

The output of this black box is usually a number. Let’s say it’s an int number. This result can be converted to any type we want, for example: If what we really want is a long, we can take two, one as the high 32 bits and one as the low 32 bits, and make a long (Boolean, byte, short, char, etc.). If we want a floating-point number, we can take random ints multiple times according to the IEEE standard combination and then take some of the bits to combine into integer bits and decimal places of the floating-point number.

If you want to limit the range, the simplest way is to implement the result as mod + offset. For example, if we want to take a value between 1 and 100, we would mod 99, then take the absolute value, and then +1. And, of course, since mod is a high performance operation, the simplest optimization is to check this number N and N minus 1, and if it’s 0 then the book is 2 to the NTH power (2 to the NTH base means it must be 100000 and so on, minus 1 is 011111, The sum must be 0); Mod 2 to the n is the same thing as taking and minus 2 to the n. This is a simple optimization, but the actual optimization is much more complicated than this.

When initializing the black box, a SEED is usually used to initialize the black box. The SEED can come from a variety of sources. Let’s first click on the table and look at some algorithms in the black box.

Linear congruence algorithm

The first is the most common random number algorithm: Linear congruent Generator. That is, multiply the current Seed by A coefficient, then add an offset B, and then mod it according to C (limit the whole within A certain range, so as to select the appropriate A and B, why do so will be explained later), and then obtain A random number, which will be used as the Seed for the next randomization, namely:

X(n+1) = ( A * X(n) + B ) % C
Copy the code

The advantage of this algorithm is that the implementation is simple and the performance is relatively good. The values of A and B must be carefully selected so that all numbers in the range of C are equally likely. For example, if A = 2, B = 2, C = 10, then the odd numbers 1,3,5,7,9 are not likely to occur in the future. In order to be able to compute A and B properly, we need to limit C to A manageable range. In general, for efficiency purposes, we limit C to 2 to the n. So the mod can be optimized to take and. Fortunately, mathematical gurus have already found these values (aka magic numbers), so we can just use them.

The random sequence generated by this algorithm is deterministic, for example, X is Y and Y is Z, which can be understood as a deterministic loop..

The size of the ring, Period. Because the Period is large enough, the initial SEED is generally different each time, thus approximating randomness. However, assuming that we need multiple random number generators, it is more troublesome, because although we can ensure that the initial SEED of each random number generator is different, under this algorithm, There is no guarantee that the initial SEED of one random number generator is the SEED next (or within a very short step) to the initial SEED of another random number generator. For example, suppose that the initial SEED of one random number generator is X and another is Z. Although X and Z may look very different, they are separated by only one Y in the algorithm’s random sequence. Such different random number generators do not work well.

So how can we ensure that different random number generators are spaced far apart? That is, we can directly separate the initial SEED of another random number generator from the initial SEED of the current one by a larger number by simply computing (instead of computing 100W times and then tuning to the random number after 100W times). This property is called leapability. Xoshiro algorithm based on linear feedback shift register algorithm provides us with a jumping random number algorithm.

Linear feedback shift register algorithm

Linear feedback Shift register (LFSR) is a shift register which uses the Linear function of the output as the input given the output of the previous state. Xor is the most common single-bit linear function: xor is performed on some bits of a register as input, and then shifts each bit in the register as a whole.

However, how to choose these bits is a matter of knowledge. At present, the more common implementation is XorShift algorithm and Xoshiro algorithm based on this further optimization. Xoshiro algorithm is a new random number optimization algorithm with simple calculation and excellent performance. At the same time, the leapability is realized.

The algorithm is jumpy. Assuming that we want to generate two random number generators with large gaps, we can use a random initial SEED to create a random number generator, and then use the jump operation of the algorithm to directly generate a large interval SEED as the initial SEED of another random number generator.

Another interesting thing is that linear congruence algorithms are not reversible, we can only deduce X(n + 1) from X(n + 1), not directly from X(n + 1). This operation corresponds to the business such as playing the song list randomly, the last song and the next song, we do not need to record the whole song list, but only according to the current random number can know. Linear feedback shift register algorithm can be reversible.

Linear feedback shift register algorithm in random sequence generator to generate different also has limitations, namely, they are from the same ring, even by jumping operation for different random number generator interval, but if the pressure is not balanced, with the passage of time, they are likely to SEED, into the same again. So is there a random algorithm that can generate loops of different random sequences?

DotMix algorithm

DotMix algorithm provides another idea, that is, given an initial SEED, set a fixed step M, add the SEED to the step M at random each time, and HASH the value to a HASH value through a HASH function:

X(n+1) = HASH(X(n) + M)
Copy the code

This algorithm has high requirements on the HASH algorithm. If the HASH algorithm changes a little input, the output will change significantly. SplitMix algorithm based on DotMix algorithm uses MurMurHash3 algorithm, which is the underlying principle of SplittableRandom introduced in Java 8.

The nice thing about this algorithm is that it’s easy to identify two random generators with different parameters whose generating sequences are different, for example a generated random sequence is 1,4,3,7… The other one produces 1,5,3,2. This is exactly what the linear congruence algorithm cannot do, because its sequence is determined no matter how it changes the SEED. However, we cannot arbitrarily change the values of A, B and C in the algorithm, because we may not be able to traverse all the numbers, which has been mentioned before. The same goes for Xoshiro. The SplitMix algorithm does not need to worry, we specify different SEED and different step size M to ensure that the generated sequence is different. This property of generating different sequences is called resoluteness

This is why SplittableRandom is better for multithreading than Random (Random based on linear congruence) :

  • Suppose multiple threads are using the sameRandom, to ensure the randomness of the sequence, but there is a performance loss of CompareAndSet new seed.
  • Assume that each thread uses the same SEEDRandom, the random sequence generated by each thread is the same.
  • Assume that each thread uses SEED differentlyRandomBut we can’t guarantee oneRandomIs the SEED another oneRandomThe next result of the SEED (or within a very short step), in which case if the thread pressure is not uniform (when the thread pool is relatively idle, only a few threads are actually working, and these threads are likely to have their private Random come to the same SEED location as other threads), Some threads will have the same random sequence.

SplittableRandom can be assigned to different threads with different SplittableRandom parameters as long as the interface split is used directly, and different parameters can basically ensure that the same sequence can not be generated.

Consider: How can we generate random sequences whose periods are greater than the number they generate?

The simplest way to do this is to combine two sequences whose Period is equal to capacity through polling, so that we get the sequence whose Period is equal to capacity + capacity:

We can also record the results of two sequences directly and then put them together using some kind of operation, such as xor or hash. In this case, Period = capacity * capacity.

If we want to expand more, all can be spliced through the above method. By stitching together the sequences of different algorithms with certain operations, we can obtain the random advantages of each algorithm. One example is the LXM algorithm introduced in Java 17.

LXM algorithm

LXM algorithm (L = linear congruent, X = Xoshiro, M = MurMurHash) is relatively simple to implement. It combines linear congruent algorithm and Xoshiro algorithm, and then hashes through MurMurHash, for example:

  • L34X64M: That is, one 32-bit number is used to hold the results of linear congruent, two 32-bit numbers are used to hold the results of Xoshiro’s algorithm, and MurMurHash is used to merge these results into a 64-bit number.
  • L128X256M: that is, two 64-bit numbers are used to hold the results of linear congruescence, and four 64-bit numbers are used to hold the results of Xoshiro’s algorithm, combining these results into one 64-bit number using MurMurHash.

The LXM algorithm implements segmentation through MurMurhash, without preserving Xoshiro’s jump.

The source of the SEED

Since all random algorithms in the JDK are based on the last input, the resulting random sequence must be the same if we use a fixed SEED. This is not suitable for security-sensitive scenarios. The official definition of cryptographically secure is that SEED must be unpredictable and produce non-deterministic output.

In Linux, the system operation data such as user input and system interrupt will be collected, and the random seed will be generated and put into the pool. The program can read the pool to obtain a random number. However, this pool will be generated only after certain data is collected, its size is limited, and its random distribution is certainly not good enough, so we can not use it to do random number directly, but to use it as the seed of our random number generator. This pool is abstracted in Linux as two files, called /dev/random and /dev/urandom. One is you have to collect a certain amount of entropy before you take it out of the pool or you block it, and the other one is you just go back to what you have whether you’ve collected enough or not.

Before Linux 4.8:

After Linux 4.8:

File :/dev/random blocks when the entropy pool is low, file:/dev/urandom does not. Egd =file:/dev/./urandom to set the JVM startup parameters, use urandom to reduce blocking.

We can also make cracking even harder by periodically resetting all Random seeds with features in the business, such as the number of active users * the number of orders per hour as the new SEED.

Test random algorithm randomness

All the algorithms above realize pseudo-randomness, that is, the current random number results are strongly correlated with the last one. In fact, this is pretty much all of the fast random algorithms out there right now.

And even if we keep the SEED secret, if we know the algorithm, we can still infer the next random output from the current random output. Or we don’t know what the algorithm is, but we can derive the algorithm backwards from a few random results to derive the subsequent results.

For this pseudo-random algorithm, it is necessary to verify that the random numbers generated by the algorithm meet some characteristics, such as:

  • The period is as long as possible: a full cycle or period is when the random sequence iterates through all possible random outcomes and returns to the number of outcomes required by the initial seed. This period should be as long as possible.
  • Equidistribution, each possible result of the generated random number, in a Period to ensure that the number of occurrences of each result is the same. Otherwise, it will affect the use of certain business, such as lottery business, we need to ensure that the probability is accurate.
  • Complexity test: Is the generated random sequence complex enough to not have a regular sequence of numbers, such as geometric sequence, arithmetic sequence, etc.
  • Security testing: It is difficult to reverse this random algorithm with fewer results.

At present, there are many framework tools used to test the random sequence generated by an algorithm, evaluate the results of the random sequence, and verify the randomness of the algorithm. The commonly used tools include:

  • TestU01 randomness test: github.com/umontreal-s…
  • NIST randomness test: nvlpubs. NIST. Gov/nistpubs/le…
  • DieHarder Suite randomness test

Random algorithms built into Java have passed most of testU01’s tests. At present, the optimization algorithms mentioned above all expose some randomness problems more or less. The LXM algorithm in Java 17 is by far the best performing randomness test. Note the randomness, not the performance.

All random algorithms involved in Java (excluding SecureRandom)

  • Linear Congruential generator: doi.org/10.1093%2Fc…
  • Linear feedback shift register: www.ams.org/journals/mc…
  • XORShift: doi.org/10.18637%2F…
  • Xoroshiro128 + : arxiv.org/abs/1805.01…
  • LXM: dl.packetstormsecurity.net/papers/gene…
  • SplitMix: gee.cs.oswego.edu/dl/papers/o…

Why do we rarely consider random security in real business applications

Mainly because we generally do load balancing multi-instance deployment and multi-threading. Typically each thread uses Random instances of a different initial SEED (such as ThreadLocalRandom). And a randomly sensitive business, such as a lottery, is limited by the number of times a single user can do it, so it’s hard to gather enough results to reverse the algorithm and the next result, and you need to draw with other users. Then, we generally restrict the range of random numbers, rather than using raw random numbers, which greatly increases the difficulty of inverse solutions. Finally, we can also periodically set our SEED using some real-time metrics of the business, for example, using the last hour’s (number of active users * number of orders) as a new SEED per hour.

Therefore, we seldom use SecureRandom in real business. Initial SEED if we want to let people to write programs when you can’t guess (also can guess the timestamp), you can specify a random initial SEED source, through the JVM parameter – Djava. Util. SecureRandomSeed = true. This works for all Random number generators in Java (e.g., Random, SplittableRandom, ThreadLocalRandom, etc.)

Corresponding source code:

static { String sec = VM.getSavedProperty("java.util.secureRandomSeed"); If (Boolean. ParseBoolean (SEC)) {// The initial SEED source from SecureRandom, On Linux, the environment variable java.security.egd specified /dev/random or /dev/urandom byte[] seedBytes = java.security.SecureRandom.getSeed(8); long s = (long)seedBytes[0] & 0xffL; for (int i = 1; i < 8; ++i) s = (s << 8) | ((long)seedBytes[i] & 0xffL); seeder.set(s); }}Copy the code

So, for our business, we generally only care about the performance of the algorithm and the average in randomness, and the algorithm that passes the test, randomness in general is not a big problem, so we just care about the performance.

For security-sensitive services such as SSL encryption and generating encrypted random hashes, a higher level of security randomness needs to be considered. This is when SecureRandom is considered. In the implementation of SecureRandom, the Random algorithm is more complex and involves some encryption ideas, so we will not pay attention to the Random algorithm of Secure here.

Before Java 17, how to generate random numbers and the corresponding randomization algorithm

First, the corresponding relation between algorithm and implementation class is given:

Use the JDK API

1. Use java.util.Random and the API based on it:

Random random = new Random();
random.nextInt();
Copy the code

Math.random()The underlying layer is also based on Random

Java. Lang. Math:

public static double random() {
    return RandomNumberGeneratorHolder.randomNumberGenerator.nextDouble();
}
private static final class RandomNumberGeneratorHolder {
    static final Random randomNumberGenerator = new Random();
}
Copy the code

Random itself is designed to be thread-safe because the SEED is Atomic and randomly CAS updates the SEED:

Java. Util. Random:

protected int next(int bits) { long oldseed, nextseed; AtomicLong seed = this.seed; do { oldseed = seed.get(); nextseed = (oldseed * multiplier + addend) & mask; } while (! seed.compareAndSet(oldseed, nextseed)); return (int)(nextseed >>> (48 - bits)); }Copy the code

It can also be seen that Random is based on linear congruence algorithm

2. Usejava.util.SplittableRandomAnd the API based on it

SplittableRandom splittableRandom = new SplittableRandom();
splittableRandom.nextInt();
Copy the code

SplittableRandom is implemented based on SplitMix. Given an initial SEED, set a fixed step M, add the SEED to the step M at random each time through a HASH function (MurMurHash3 in this case). Map the value HASH to a HASH value.

SplittableRandom itself is not thread-safe: Java. Util. SplittableRandom:

public int nextInt() { return mix32(nextSeed()); } private long nextSeed() {return seed += gamma; }Copy the code

ThreadLocalRandom is implemented based on SplittableRandom. We use ThreadLocalRandom in multi-threaded environment:

ThreadLocalRandom.current().nextInt();
Copy the code

The new SplittableRandom returns a parameter via the split method. The new SplittableRandom can be used by different threads to generate random numbers. This is very common in Parallel streams:

IntStream.range(0, 1000)
    .parallel()
    .map(index -> usersService.getUsersByGood(index))
    .map(users -> users.get(splittableRandom.split().nextInt(users.size())))
    .collect(Collectors.toList());
Copy the code

However, the performance of ThreadLocalRandom based on SplittableRandom in multithreaded environments is still worse than that of ThreadLocalRandom based on SplittableRandom due to the lack of alignment padding and other multi-threaded performance optimizations.

Use 3.java.security.SecureRandomGenerate more secure random numbers

SecureRandom drbg = SecureRandom.getInstance("DRBG");
drbg.nextInt();
Copy the code

Generally, this algorithm, based on encryption algorithm, is more complex and has poor performance. It is only used for security sensitive services, but not for general services (such as lottery).

Test performance

Single-threaded test:

Benchmark Mode Cnt Score Error Units TestRandom. TestDRBGSecureRandomInt THRPT 50 to 940907.223 + 11505.342 ops/s TestRandom. TestDRBGSecureRandomIntWithBound THRPT 50 to 992789.814 + 71312.127 ops/s TestRandom testRandomInt THRPT 50 106491372.544 + / - 8881505.674 ops/s TestRandom. TestRandomIntWithBound THRPT 50 to 99009878.690 + 9411874.862 ops/s TestRandom. TestSplittableRandomInt THRPT 50 to 295631145.320 + 82211818.950 ops/s TestRandom. TestSplittableRandomIntWithBound THRPT 50 to 190550282.857 + 17108994.427 ops/s TestRandom. TestThreadLocalRandomInt THRPT 50 to 264264886.637 + 67311258.237 ops/s TestRandom. TestThreadLocalRandomIntWithBound THRPT 50 to 162884175.411 + 12127863.560 ops/sCopy the code

Multithreading tests:

Benchmark Mode Cnt Score Error Units TestRandom. TestDRBGSecureRandomInt THRPT 50 to 2492896.096 + 19410.632 ops/s TestRandom. TestDRBGSecureRandomIntWithBound THRPT 50 to 2478206.361 + 111106.563 ops/s TestRandom testRandomInt THRPT 50 345345082.968 + / - 21717020.450 ops/s TestRandom. TestRandomIntWithBound THRPT 50 to 300777199.608 + 17577234.117 ops/s TestRandom. TestSplittableRandomInt THRPT 50 to 465579146.155 + 25901118.711 ops/s TestRandom. TestSplittableRandomIntWithBound THRPT 50 to 344833166.641 + 30676425.124 ops/s TestRandom. TestThreadLocalRandomInt THRPT 50 to 647483039.493 + 120906932.951 ops/s TestRandom. TestThreadLocalRandomIntWithBound THRPT 50 to 467680021.387 + 82625535.510 ops/sCopy the code

The results are in line with our previously stated expectations, with ThreadLocalRandom performing best in a multi-threaded environment. In a single-threaded environment SplittableRandom and ThreadLocalRandom are basically close and perform better than others. SecureRandom is hundreds of times worse than others.

The test code is as follows (note that while Random and SecureRandom are both thread-safe, ThreadLocal is used to avoid too much performance degradation from compareAndSet.) :

package prng; import java.security.NoSuchAlgorithmException; import java.security.SecureRandom; import java.util.Random; import java.util.SplittableRandom; import java.util.concurrent.ThreadLocalRandom; import org.openjdk.jmh.annotations.Benchmark; import org.openjdk.jmh.annotations.BenchmarkMode; import org.openjdk.jmh.annotations.Fork; import org.openjdk.jmh.annotations.Measurement; import org.openjdk.jmh.annotations.Mode; import org.openjdk.jmh.annotations.Scope; import org.openjdk.jmh.annotations.State; import org.openjdk.jmh.annotations.Threads; import org.openjdk.jmh.annotations.Warmup; import org.openjdk.jmh.infra.Blackhole; import org.openjdk.jmh.runner.Runner; import org.openjdk.jmh.runner.RunnerException; import org.openjdk.jmh.runner.options.Options; import org.openjdk.jmh.runner.options.OptionsBuilder; // The Throughput index is @benchmarkmode (mode.throughput) // The Throughput index is @benchmarkmode (mode.throughput) // The Throughput index is @benchmarkmode (mode.throughput) So you can change your iterations to @warmup (iterations = 1) // Number of Threads @threads (10) @fork (1) // Number of iterations, We go through 50 iterations @Measurement(Iterations = 50) // to define the life cycle of a class instance, @state (value = scope.benchmark) Public class TestRandom {ThreadLocal<Random> Random = ThreadLocal.withInitial(Random::new); ThreadLocal<SplittableRandom> splittableRandom = ThreadLocal.withInitial(SplittableRandom::new); ThreadLocal<SecureRandom> drbg = ThreadLocal.withInitial(() -> { try { return SecureRandom.getInstance("DRBG"); } catch (NoSuchAlgorithmException e) { throw new IllegalArgumentException(e); }}); @Benchmark public void testRandomInt(Blackhole blackhole) throws Exception { blackhole.consume(random.get().nextInt()); } @benchmark public void testRandomIntWithBound(Blackhole Blackhole) throws Exception { Blackhole.consume (random.get().nextint (1, 100)); } @Benchmark public void testSplittableRandomInt(Blackhole blackhole) throws Exception { blackhole.consume(splittableRandom.get().nextInt()); } @ Benchmark public void testSplittableRandomIntWithBound (Blackhole Blackhole) throws the Exception {/ / note that doesn't take 2 ^ n the Numbers, Blackhole.consume (splitTablerandom.get ().nextint (1, 100))); } @Benchmark public void testThreadLocalRandomInt(Blackhole blackhole) throws Exception { blackhole.consume(ThreadLocalRandom.current().nextInt()); } @ Benchmark public void testThreadLocalRandomIntWithBound (Blackhole Blackhole) throws the Exception {/ / note that doesn't take 2 ^ n the Numbers, Because this number is usually not as the range of practical applications, but for the bottom Numbers have optimized blackhole. Consume (ThreadLocalRandom. The current () nextInt (1, 100)); } @Benchmark public void testDRBGSecureRandomInt(Blackhole blackhole) { blackhole.consume(drbg.get().nextInt()); } @ Benchmark public void testDRBGSecureRandomIntWithBound (Blackhole Blackhole) {/ / note that doesn't take 2 ^ n the number, because this number is usually not as the range of practical applications, Blackhole.get ().nextint (1, 100)); } public static void main(String[] args) throws RunnerException { Options opt = new OptionsBuilder().include(TestRandom.class.getSimpleName()).build(); new Runner(opt).run(); }}Copy the code

Wechat search “my programming meow” public account, a daily brush, easy to improve skills, won a variety of offers