Share a topic shared by a member of Melbourne, nicknamed “Paper Machine”, who is known as the little prince of mathematics!


Special note: this article is not original, published with the consent of the contributor.

01. Algorithm introduction

Expectation: In probability theory and statistics, mathematical expectation (or mean, also known as expectation) is the probability of each possible outcome of an experiment multiplied by the sum of its outcomes. It is one of the most basic mathematical characteristics. It reflects the average value of the random variable.


If three points are randomly scattered in a 1*1 square, each of which forms a pair of vertices for a rectangle, then there are three rectangles in total, and we need to find the expectation of the area of the second largest rectangle.
Algorithm: three points are randomly selected each time, the second area is calculated, and finally the expectation is counted.

02. Monte Carlo

Monte Carlo method is also known as statistical simulation method, statistical test method. It is a numerical simulation method which takes probability phenomenon as the research object. It is a calculation method of calculating unknown characteristic quantity by sampling survey method. Monte Carlo is Monaco’s famous gambling town, named for its random sampling nature. Therefore, it is suitable for calculating and simulating test of discrete system. In computational simulation, by constructing a probability model similar to the system performance and conducting random experiments on a digital computer, the stochastic characteristics of the system can be simulated.


Monte Carlo Method refers to a class of methods that use random variables to solve probabilistic problems. The more common problems are the calculation of integral, probability, expectation and so on.


Common Monte Carlo methods rely on the “randomness” of random variables, meaning that events that have not occurred cannot be predicted based on existing information, such as coin flips or dice rolls. In computers, common random numbers are generated by a series of deterministic algorithms, which are commonly called pseudo random numbers. Because the computational accuracy is limited, and these random numbers are “not random enough” in the statistical sense, predictable repeated sequences will occur, and the convergence accuracy of these numbers is limited in the statistical sense.


Different from the common Monte Carlo method, the pseudo-Monte Carlo method uses low discrepancy sequence (commonly known as Halton sequence, SOBOL sequence, etc.) and does not use common (pseudo-) random number, so the convergence rate is faster (N is the number of samples, and the pseudo-Monte Carlo convergence rate can reach, while the convergence rate of ordinary Monte Carlo method is only. Another important property of pseudo-Monte Carlo is that the low-difference sequence used is replicable, that is, it does not change with the environment, and there is no random seed; However, the pseudo-random numbers used in ordinary Monte Carlo will lead to different results due to different random seeds, and the convergence effect is also different.

03. Topic analysis

This algorithm uses pseudo Monte Carlo to complete.


The CPP code is as follows:

#include <cmath> 
#include <cstdio> 
#include <vector> 
#include <cassert> 
#include <omp.h> 
const int UP=100; bool sieve[UP 100]; int primes[UP],top=0; void init(a) {  for (int i=2; i<=UP; i) if(! sieve[i]) {  primes[top ]=i;  for (intj=i; j<=UP/i; j) sieve[i*j]=true;  } } std: :vector<double> halton(long long i,const int &dim) {  assert(dim<=top);  std: :vector<double> prime_inv(dim,0),r(dim,0);  std: :vector<long long> t(dim,i);  for (int j=0; j<dim; j) prime_inv[j]=1.0/primes[j];  auto f=[](const std: :vector<long long> &t)->long long {  long long ret=0;  for (const auto &e:t)  ret =e;  return ret;  };  for(; f(t)>0;)  for (int j=0; j<dim; j) {  long long d=t[j]%primes[j];  r[j] =d*prime_inv[j];  prime_inv[j]/=primes[j];  t[j]/=primes[j];  }  return r; } double experiment(long long idx){  std: :vector<double> li=halton(idx,6);  double area1=fabs((li.at(0)-li.at(2))*(li.at(1)-li.at(3)));  double area2=fabs((li.at(0)-li.at(4))*(li.at(1)-li.at(5)));  double area3=fabs((li.at(2)-li.at(4))*(li.at(3)-li.at(5)));  double w=area1 area2 area3-std::max(std::max(area1,area2),area3)-std::min(std::min(area1,area2),area3);  return w; } const int BATCH=100000; const int THREADS=40; int main(a) {  init();  double total=0;  for (long long trial=0;;)  {  std: :vector<double> li(THREADS,0);  omp_set_dynamic(0);  omp_set_num_threads(THREADS);  #pragma omp parallel for  for (long long thread=0; thread<THREADS; thread) {  for (long long i=0; i<BATCH; i) li.at(thread) =experiment(trial thread*BATCH i);  }  for (const auto &d:li)  total =d;  trial =THREADS*BATCH;  printf("%lld: %.10f\n",trial,total/trial),fflush(stdout);  }  return 0; } Copy the code


Analysis: using parallel calculation, batch running random experiment, speed greatly improved. Where, halton function will generate halton low-difference sequence, whose range is [0,1], parameter I represents the ith sample, and dim represents the dimension of generated data (in this case, 6 points are required for each experiment, and 6-dimensional data points are enough). Different samples do not affect each other, so parallel computation can be used to speed up the process.


# represents the number of random trials ×10^7, Avg represents the average value of the second large area, Err represents the absolute error from the true value ×10^ (-10).


# Avg Err # Avg Err
1 0.1017786804 55 2 0.1017786707 152
3 0.1017786905 46 4 0.1017786889 30
5 0.1017786809 50 6 0.1017786836 23
7 0.1017786849 10 8 0.1017786868 9
9 0.1017786799 60 10 0.1017786837 22
11 0.1017786845 14 12 0.1017786839 20
13 0.1017786874 15 14 0.1017786839 20
15 0.1017786848 11 16 0.1017786868 9
17 0.1017786851 8 18 0.1017786863 4
19 0.1017786854 5 20 0.1017786887 28
21 0.1017786858 1 22 0.1017786844 15
23 0.1017786841 18 24 0.1017786852 7
25 0.1017786849 10 26 0.101778684 19
27 0.1017786838 21 28 0.1017786852 7
29 0.1017786838 21 30 0.1017786846 13
31 0.1017786859 0 32 0.1017786862 3
33 0.1017786859 0 34 0.1017786853 6
35 0.1017786854 5 36 0.1017786859 0
37 0.101778685 9 38 0.1017786854 5
39 0.1017786853 6 40 0.1017786858 1
41 0.1017786848 11 42 0.1017786851 8
43 0.1017786847 12 44 0.1017786841 18
45 0.101778685 9 46 0.1017786842 17
47 0.1017786852 7 48 0.1017786848 11
49 0.1017786854 5 50 0.1017786851 8
51 0.1017786842 17 52 0.1017786844 15

As you can see, after the experiment, the convergence is up to 9 decimal places, very precise. Because the random numbers used are “not random enough”, ordinary Monte Carlo can only converge to five decimal places with the same number of experiments.


The above method can be extended to other random problems, very practical and efficient, welcome to discuss!


So, did you learn today’s quiz question? Leave your thoughts in the comments section!


I’ve compiled all the problems I’ve written into an e-book, complete with illustrations for each problem. Click to download