Reference: Three Shuffle Algorithms by Lyz_CS

primers

Leetcode 384 scrambles the array so that each number has an equal probability of appearing anywhere in the array.

I have no contact with the shuffling algorithm, first contact with this problem to write the following code:

public int[] shuffle() {
    Random r = new Random();
    for(int i=0; i<nums.length; i++){int j = r.nextInt(nums.length-1);
        intt = nums[i]; nums[i] = nums[j]; nums[j] = t; }}Copy the code

After writing, submit correctly. In the comments section, they came up with a very strange algorithm (actually k-D shuffle algorithm) :

public int[] shuffle() {
        Random r = new Random();
        // Select a random number that matches the range, swap the elements of the two positions, and keep narrowing the range
        for(int i=n-1; i>0; i--){int j = r.nextInt(i+1);
            int t = nums[i];
            nums[i] = nums[j];
            nums[j] = t;
        }
        return nums;
    }
Copy the code

After reading, greatly puzzled. I go back and forth, randomly interchanging with myself and the numbers before me. Why do I write that?

My writing method, will be each number randomly with 0 to n-1 position of the number exchange, obviously the result is also the average distribution (with MATLAB for many simulations, also support this result), the difficulty of thinking is also small, why bother? Read a lot of articles, understand the classic shuffling algorithm, but still did not solve my doubts.

Next, first supplement introduces several classic shuffling algorithm, and then give the answer in the end. Those of you who already know can skip to the end.

The classic Shuffle algorithm

  • Fisher – Yates Shuffle algorithm

The fisher-Yates Shuffle was first proposed by Ronald A. Fisher and Frank Yates. The basic idea of this Shuffle is to randomly select A number from the original array into A new array, as follows:

1. Initialize the original array and the new array, the original array length is N (known); 2. From haven't handle arrays (if left k), randomly generated a [0, k) number between p (assuming that the array starting from 0); (3) from the rest of the number of k take out the number of the first p; 4. Repeat steps 2 and 3 until all Numbers exhausted; 5. From step 3 to take out the sequence of Numbers is a disrupted sequence. To prove: The probability of drawing a lot is the same, and the probability of each number being drawn is the same at each position in the new array.Copy the code
  • Knuth-durstenfeld Shuffle

    Knuth and Durstenfeld improved on Fisher et al. ‘s algorithm by swapping numbers on the original array, saving space for extra O(n). The basic idea of this algorithm is similar to That of Fisher. Each time, a random number is picked from the unprocessed data, and then the number is placed at the end of the array, that is, the end of the array contains the processed numbers. The algorithm steps are as follows:

1. Create an array arR with size n and store values from 1 to N; 2. Generate a random number x from 0 to n-1; 3. Output the value of arR subscript x, that is, the first random number; 4. Swap the tail element of ARR with the element with subscript X; 5. Same as 2, generate a random number x from 0 to n-2; 6. Output the value of arR subscript x, the second random number; 7. Swap the penultimate element of ARR with the element with subscript x; ... For ARR [I], after shuffling, the probability of the n-1st position is 1/n (the random number exchanged for the first time is I) and the probability of the n-2nd position is [(n-1)/n] * [1/(n-1)] = 1/n, (the random number exchanged for the first time is not I, The second is the position of arr[I] (note that if I =n-1, the first swap arr[n-1] will be swapped to a random position)) in the n-k position is [(n-1)/n] * [(n-2)/(n-1)] *... * (n - k + 1)/(n - k + 2)] * [1 / (n - k + 1) = 1 / n (the first random number don't I, for the second time for arr [I] location (as exchange may change)... The n-k is the position of arr[I]).Copy the code
  • Inside-Out Algorithm

    Knuth-durstenfeld Shuffle is an internal shuffling algorithm. After the algorithm is completed, the original data is directly shuffled. Although this method can save space, the original data may need to be retained in some applications, so it is necessary to create another array to store the generated new sequence.

The basic idea of the insideout Algorithm is to scan the data from front to back, randomly insert the data of position I into the previous I (including the ith) position (let's say K) in a new array, and then replace the number of position K in the original data with the number of position I in the new array. It's essentially a new array of positions k interacting with the numbers of positions I. If you know the lengh of arR, you can change it to a for loop, which can handle the unknown number of ARR [] because it is traversed from front to back, or if arR [] is dynamically increasing. The proof is as follows: (1/ I) * [I /(I +1)] * [(I +1)/(I +2)] *... * [(n-1)/n] = 1/n. The probability that the ith element of the original array (the random number) will be positioned after I +1 (including I +1) of the new array (assuming the KTH position) is: (1/k) * [k/(k+1)] * [(k+1)/(k+2)] *... * [(n-1)/n] = 1/n (that is, the KTH time just randomly placed in this position, in the following n-k times the number is not selected).Copy the code

Advantages of shuffling algorithm

In contrast to the mature shuffle algorithm, I iterate over each number in the array and swap it with random numbers from 0 to n-1, so that the array is indeed a randomly distributed array.

So, what’s the difference between my method and existing random algorithms?

The difference is large enough that swapping a number at each position at random with a number from 0 to n minus 1 causes all positions in the array to change until all numbers are iterated. That is, you can’t guarantee even distribution until you manipulate the entire array.

I didn’t see that in the force problem, but if the problem was to randomly pick k numbers out of an array, my algorithm would have to go through the array. On the other hand, the shuffle algorithm, such as KD algorithm, generates a random number that meets the average distribution. Therefore, it is necessary not to change the position that has been operated under the premise of ensuring even distribution. As a result, there is an anti-common sense operation in which the number is only exchanged with itself or the preceding number in each operation.