- π About the author: Hello, EVERYONE, I love to knock code xiao Huang, the Unicorn enterprise Java development engineer
- π personal public number: love to knock code of small huang
- π series of columns: Java design patterns, data structures, and algorithms
- π§ If the article knowledge point has the wrong place, please correct! Learn with you and make progress π
- π₯ if you feel that the blogger’s article is good, please π three even support π a blogger oh
- π bloggers are working hard to complete project 2022: dream as a horse, set sail, 2022 Dream chasers
One, the introduction
Recently, xiao Huang’s friend went to an industry famous game company (interested readers can guess) interview, out of a classic topic: at present, the game needs to carry out a sign-in lottery activity, how to ensure that everyone draws the probability of the prize is equal, to ensure that the number of the final winner is 10 people
Huang’s friends calculate the total number of people who check in, use the Random function to get the winners, and so on, and finally find 10 winners
But the interviewer wasn’t happy with this and offered several conditions:
- The number of people who signed in
N
Large and unknowable - Randomly selected
10
Individuals, each of them has a probability of being selected10/N
- Time complexity is
O(N)
Xiao Huang friend momentarily a little confused, at a loss, chat about something else, scribbled over the interview
My friend talked with Xiao Huang and finally found that the interviewer wanted to investigate Reservoir Sampling.
Energetic readers can check out these template questions in advance:
- 382. Linked list random nodes
- 398. Random number index
- 519. Random inversion matrix
- 497. A random point in a nonoverlapping rectangle
2. Reservoir sampling algorithm
According to the conditions that the number of people N is unknowable, the probability of each candidate is 10/N, and the time complexity is O(N), we can know that our algorithm must traverse once, and the probability of each candidate decreases gradually with the increase of the number of sign-in, but the probability of each candidate is the same
For (int num = 1; num <= 100; num++)
- when
num = 10
, the probability that we pick each number is10/10
- when
num = 50
, the probability that we pick each number is10/50
- when
num = 100
, the probability that we pick each number is10/100
We can see that with the dynamic increase of the number of people, our probability is different. If we simply use the Random algorithm above, we have no way to adjust dynamically
Below, let’s take a look at the secret of reservoir sampling algorithm
1. The train of thought
Our overall pool is as follows:
The currentnum <= 10
“, we let it directly into the pool
If the current num is greater than 10, we randomize from 1 to num
- If the random result is less than or equal to 10, the pool is replaced
- If the random result is greater than 10, no operation is done
Assuming the current value is 12, we use random.nextint (12) + 1 to get random values from 1 to 12
-
Let’s say the random value is zero
5
,12
Subscript the replacement pool to5
The number of -
Let’s say the random value is zero
11
, no operation is performed
The final number in the reservoir is the number of people who signed in and won
At this point, I’m sure 80% of you will be confused, and that’s okay. I’ve been there, too
Let’s do the following proof
2. To prove
We assume that the current num is 30, so each value has a 10/30 chance of being selected
First, we prove the probability of 3:
- for
3
In terms of the probability of entering the pool:100% by 100
The probability of getting out of the pool is equal to the probability of being replacednum > 10
)11
To be able to replace3
The probability of:(10/11) * (1/10) = (1/11)
(Probability of replacing pools * random pools is3
The probability of)12
To be able to replace3
The probability of:(10/12) * (1/10) = (1/12)
- .
- Let’s think about it the other way around, so let’s
3
What’s the probability of not getting out of the pool?(3) Probability of entering the pool * (3) probability of not getting out of the pool
1 * (10/11) * (11/12) * (12/13) *... (29/30) = (10/30)
- It fits our final result
Finally, we prove the probability of 17:
- for
17
In terms of the probability of entering the pool:10/17
The probability of getting out of the pool is equal to the probability of being replacednum > 17
)18
To be able to replace17
The probability of:(10/18) * (1/10) = (1/18)
(Probability of replacing pools * random pools is17
The probability of)19
To be able to replace17
The probability of:(10/19) * (1/10) = (1/19)
- .
- Let’s think about it the other way around
17
What’s the probability of not getting out of the pool?(17) Probability of getting into the pool * (17) probability of not getting out of the pool
(17/18) (10/17) * * * (19/20) (18/19) *... (29/30) = (10/30)
- It fits our final result
3. Code
public class _Reservoir algorithm{
static Random random = new Random();
public static void main(String[] args) {
// 10000 tests
int test = 10000;
// The number is 100
int dataBase = 100;
// Record the frequency of each occurrence
int[] count = new int[101];
for (int i = 0; i < test; i++) {
/ / reservoir
int[] bag = new int[10];
// Reservoir subscript
int bagIndex = 0;
for (int num = 1; num <= dataBase; num++) {
// If less than 10, go directly to the pool
if (num <= 10) {
bag[bagIndex++] = num;
} else {
// If it is greater than 10, the pool can be pooled with a probability of 10 / I
// if less than or equal to 10, the pool can be added
if (getNum(num) <= 10) {
// Randomly eliminate a number in the pool
int index = random.nextInt(10); bag[index] = num; }}}// Record the current occurrence frequency
for (intnum : bag) { count[num]++; }}// Output and verify our random rate
for (int i = 0; i < count.length; i++) { System.out.println(count[i]); }}// Return the number of [1, I]
public static int getNum(int i) {
return random.nextInt(i) + 1; }}Copy the code
Test results: We found that the frequency of occurrence is roughly equal
999
1009
967
1050
981
996
996
950
971
998
1005
1032
974
982
1007
1029
1026
1045
Copy the code
Three, sample exercises
As the saying goes, it’s all about seeing, it’s all about doing, and it’s time to test your results
Buckle template title:
- 382. Linked list random nodes
- 398. Random number index
- 519. Random inversion matrix
- 497. A random point in a nonoverlapping rectangle
Basic is the template problem, directly set above the formula can be
Let me play a small advertisement here, pay attention to the public number: love to knock code huang, reply: algorithm source code, not only can obtain reservoir and other types of algorithm source code.
Your attention may bring me closer to the peak of success, study hard together, and grow hard.
Four,
That’s the end of the basic reservoir sampling algorithm
In engineering mainly applied to the lottery system, here some people may question, I directly wait for the lottery finished, and then calculate the total number of people, can not also achieve the same effect. In terms of effect, there’s no difference.
However, the biggest difference between engineering and ideal is that engineering can affect the normal flow due to a series of external factors.
For example, I originally planned to hold the lottery for 2 days from 2022.1.8 to 2022.1.10, and the lottery drawing time was set at 2022.1.11. If the server breaks down and leads to data loss, how can I guarantee the total number of people? Our reservoir sampling algorithm updates the number of winners.
When you encounter this kind of interview question, go straight to three steps: principle -> proof -> code, one wave to take the interviewer away
This period of reservoir algorithm is over here, the next phase will be about computer network related knowledge, let me disclose in advance, computer network eight topics are also quite research.
I am a Java development engineer of a unicorn enterprise. I hope you can pay attention to it. If you have any questions, you can leave a message or add a private message to my wechat.