• 👏 About the author: Hello, EVERYONE, I love to knock code xiao Huang, the Unicorn enterprise Java development engineer
  • 📝 personal public number: love to knock code of small huang
  • 📕 series of columns: Java design patterns, data structures, and algorithms
  • 📧 If the article knowledge point has the wrong place, please correct! Learn with you and make progress 👀
  • 🔥 if you feel that the blogger’s article is good, please 👍 three even support 👍 a blogger oh
  • 🍂 bloggers are working hard to complete project 2022: dream as a horse, set sail, 2022 Dream chasers

One, the introduction

Recently, xiao Huang’s friend went to an industry famous game company (interested readers can guess) interview, out of a classic topic: at present, the game needs to carry out a sign-in lottery activity, how to ensure that everyone draws the probability of the prize is equal, to ensure that the number of the final winner is 10 people

Huang’s friends calculate the total number of people who check in, use the Random function to get the winners, and so on, and finally find 10 winners

But the interviewer wasn’t happy with this and offered several conditions:

  • The number of people who signed inNLarge and unknowable
  • Randomly selected10Individuals, each of them has a probability of being selected10/N
  • Time complexity isO(N)

Xiao Huang friend momentarily a little confused, at a loss, chat about something else, scribbled over the interview

My friend talked with Xiao Huang and finally found that the interviewer wanted to investigate Reservoir Sampling.

Energetic readers can check out these template questions in advance:

  • 382. Linked list random nodes
  • 398. Random number index
  • 519. Random inversion matrix
  • 497. A random point in a nonoverlapping rectangle

2. Reservoir sampling algorithm

According to the conditions that the number of people N is unknowable, the probability of each candidate is 10/N, and the time complexity is O(N), we can know that our algorithm must traverse once, and the probability of each candidate decreases gradually with the increase of the number of sign-in, but the probability of each candidate is the same

For (int num = 1; num <= 100; num++)

  • whennum = 10, the probability that we pick each number is10/10
  • whennum = 50, the probability that we pick each number is10/50
  • whennum = 100, the probability that we pick each number is10/100

We can see that with the dynamic increase of the number of people, our probability is different. If we simply use the Random algorithm above, we have no way to adjust dynamically

Below, let’s take a look at the secret of reservoir sampling algorithm

1. The train of thought

Our overall pool is as follows:

The currentnum <= 10“, we let it directly into the pool

If the current num is greater than 10, we randomize from 1 to num

  • If the random result is less than or equal to 10, the pool is replaced
  • If the random result is greater than 10, no operation is done

Assuming the current value is 12, we use random.nextint (12) + 1 to get random values from 1 to 12

  • Let’s say the random value is zero5,12Subscript the replacement pool to5The number of

  • Let’s say the random value is zero11, no operation is performed

The final number in the reservoir is the number of people who signed in and won

At this point, I’m sure 80% of you will be confused, and that’s okay. I’ve been there, too

Let’s do the following proof

2. To prove

We assume that the current num is 30, so each value has a 10/30 chance of being selected

First, we prove the probability of 3:

  • for3In terms of the probability of entering the pool:100% by 100The probability of getting out of the pool is equal to the probability of being replacednum > 10)
    • 11To be able to replace3The probability of:(10/11) * (1/10) = (1/11)(Probability of replacing pools * random pools is3The probability of)
    • 12To be able to replace3The probability of:(10/12) * (1/10) = (1/12)
    • .
  • Let’s think about it the other way around, so let’s3What’s the probability of not getting out of the pool?
    • (3) Probability of entering the pool * (3) probability of not getting out of the pool
    • 1 * (10/11) * (11/12) * (12/13) *... (29/30) = (10/30)
    • It fits our final result

Finally, we prove the probability of 17:

  • for17In terms of the probability of entering the pool:10/17The probability of getting out of the pool is equal to the probability of being replacednum > 17)
    • 18To be able to replace17The probability of:(10/18) * (1/10) = (1/18)(Probability of replacing pools * random pools is17The probability of)
    • 19To be able to replace17The probability of:(10/19) * (1/10) = (1/19)
    • .
  • Let’s think about it the other way around17What’s the probability of not getting out of the pool?
    • (17) Probability of getting into the pool * (17) probability of not getting out of the pool
    • (17/18) (10/17) * * * (19/20) (18/19) *... (29/30) = (10/30)
    • It fits our final result

3. Code

public class _Reservoir algorithm{
    static Random random = new Random();
    public static void main(String[] args) {
    	// 10000 tests
        int test = 10000;
        // The number is 100
        int dataBase = 100;
        // Record the frequency of each occurrence
        int[] count = new int[101];
        for (int i = 0; i < test; i++) {
        	/ / reservoir
            int[] bag = new int[10];
            // Reservoir subscript
            int bagIndex = 0;
            for (int num = 1; num <= dataBase; num++) {
                // If less than 10, go directly to the pool
                if (num <= 10) {
                    bag[bagIndex++] = num;
                } else {
                    // If it is greater than 10, the pool can be pooled with a probability of 10 / I
                    // if less than or equal to 10, the pool can be added
                    if (getNum(num) <= 10) {
                        // Randomly eliminate a number in the pool
                        int index = random.nextInt(10); bag[index] = num; }}}// Record the current occurrence frequency
            for (intnum : bag) { count[num]++; }}// Output and verify our random rate
        for (int i = 0; i < count.length; i++) { System.out.println(count[i]); }}// Return the number of [1, I]
    public static int getNum(int i) {
        return random.nextInt(i) + 1; }}Copy the code

Test results: We found that the frequency of occurrence is roughly equal

999
1009
967
1050
981
996
996
950
971
998
1005
1032
974
982
1007
1029
1026
1045
Copy the code

Three, sample exercises

As the saying goes, it’s all about seeing, it’s all about doing, and it’s time to test your results

Buckle template title:

  • 382. Linked list random nodes
  • 398. Random number index
  • 519. Random inversion matrix
  • 497. A random point in a nonoverlapping rectangle

Basic is the template problem, directly set above the formula can be

Let me play a small advertisement here, pay attention to the public number: love to knock code huang, reply: algorithm source code, not only can obtain reservoir and other types of algorithm source code.

Your attention may bring me closer to the peak of success, study hard together, and grow hard.

Four,

That’s the end of the basic reservoir sampling algorithm

In engineering mainly applied to the lottery system, here some people may question, I directly wait for the lottery finished, and then calculate the total number of people, can not also achieve the same effect. In terms of effect, there’s no difference.

However, the biggest difference between engineering and ideal is that engineering can affect the normal flow due to a series of external factors.

For example, I originally planned to hold the lottery for 2 days from 2022.1.8 to 2022.1.10, and the lottery drawing time was set at 2022.1.11. If the server breaks down and leads to data loss, how can I guarantee the total number of people? Our reservoir sampling algorithm updates the number of winners.

When you encounter this kind of interview question, go straight to three steps: principle -> proof -> code, one wave to take the interviewer away

This period of reservoir algorithm is over here, the next phase will be about computer network related knowledge, let me disclose in advance, computer network eight topics are also quite research.

I am a Java development engineer of a unicorn enterprise. I hope you can pay attention to it. If you have any questions, you can leave a message or add a private message to my wechat.