This is the 18th day of my participation in Gwen Challenge
Author: Cola
Source: Coke’s path to data analysis
Please contact authorization for reprinting (wechat ID: data_COLA)
Summary of some common probability and statistics data analysis interview questions, updated from time to time……
Meaning of random variable
All possible values X of a random event, each of which has a certain probability P, X is a random variable of P(X). Like the number of points in a die roll
What’s the relationship between random variables and randomized trials
- Randomised trial: an experiment in which a large number of repeated observations are made of a random phenomenon under the same conditions, such as the number of heads when flipping a coin 100 times
- Random variables are used to describe the results of randomized trials.
The basis of dividing continuous random variable and discrete random variable
- Discrete random variable: The random variable X can be listed one by one, such as the number of defective products in a batch, the number of births in a region, etc.
- Continuous random variable: random variable X cannot be enumerated one by one, such as the life span of a batch of electronic components, height, weight, etc.
So they’re divided by whether the random variable is countable or not
The distinction between variable independence and uncorrelation
If X and Y are not correlated, it is generally considered that there is no linear relationship between X and Y, but there is no other relationship
If X and Y are independent, it doesn’t matter
Thus, “irrelevant” is a weaker concept than “independent”
The distribution function/probability density function for common distributions, and the properties of distributions.
From the two aspects of discrete type and continuous type:
The distribution of discrete random variables
- The binomial distribution
Run a series of independent trials -> each with the possibility of success and failure and equal probability of success -> a limited number of trials.
Let’s call the binomial distribution X to B(n,p), where X is the number of successes in n trials, and we want to figure out the number of successes
If 100 coupons are issued, the probability that X coupons will be used is a binomial distribution.
- Bernoulli distribution
The 0-1 distribution, where there are only 2 outcomes per trial, is a special case of the binomial distribution with n=1
If you flip a coin, you can only get heads or tails
- Geometric distribution
Independent trials -> The probability of getting a card is equal -> How many trials must be performed to collect the cards
- Poisson distribution
A single event occurs randomly and independently within a given interval (a given interval can be time or space) -> The average number of events occurring within this interval is given and is finite.
For example, if the average number of vehicles coming for refueling at a gas station is 10 per hour, poisson’s distribution is the probability of the number of vehicles coming for refueling at this gas station per hour
For the distribution of discrete random variables, please refer to:
Distribution of continuous random variables
- Normal distribution
Also known as the Gaussian distribution, the normal distribution is determined by the parameter mean and variance
- Uniform distribution
Also known as a rectangular distribution, the probability density function results in a fixed number
Uniform distribution is extremely rare in nature, and its probability density function is:
- An index distribution
An exponential distribution is a probability distribution that describes the time between events in a Poisson process, i.e. a process in which events occur continuously and independently at a constant average rate. For example, the time interval between passengers entering the airport, and the life distribution of many electronic products generally obey the exponential distribution.
Its probability density function is:
Exponential distribution has the key property of memorylessness. This means that if a random variable exponential distribution, when s, when t > 0 P + s (t > t | t > t) = P (t) > s. That is, if T is the lifetime of a component, given that the component has been used for T hours, the conditional probability that it will be used for at least s+ T hours in total is equal to the probability that it will be used for at least s hours since the beginning of use.
For the distribution of continuous random variables, please refer to:
Covariance and correlation coefficient
- covariance
It’s just the direction that’s relevant
Measuring the population error of two variables, variance is the special case of covariance, that is, when two variables are the same.
If two variables move in the same direction, that is, if one of them is bigger than its expected value, and the other is bigger than its expected value, then the covariance between the two variables is positive (you get bigger, I get bigger, so the covariance is positive). If two variables move in opposite directions, i.e. one is greater than its expected value and the other is less than its expected value, then the covariance between the two variables is negative.
In other words, if the covariance is positive, it means that the two variables are the same change, negative, different change
And the absolute value of covariance does not reflect the degree of linear correlation (its absolute value is related to the value range of the variable)
However, the two random variables with 0 covariance are not correlated
- The correlation coefficient
Not only does it represent the direction of linear correlation, but it also measures the degree of correlation
The value range of the linear correlation degree between variables is [-1,1].
The correlation coefficient can also be viewed as a covariance: a special covariance normalized after removing the dimensional influence of two variables.
Is the median equal to the expectation
When the median of the standard normal distribution is equal to the expected right skew (positive skew) state, the median is less than the expected left skew (negative skew) state, and the median is greater than the expected
What are the basic features of a normal distribution
The normal distribution, also known as the Gaussian distribution, is a bell shaped curve. The curve is symmetrical, with the highest probability density in the center, and the lower the probability density as you move toward both sides. μ determines the center of the curve, σ determines the dispersion of the curve. The greater σ, the gentler the curve; the less σ, the steeper the curve.
Many practical problems are normally distributed, such as height and weight. Normal distribution is also widely used in quality management. The “3σ principle” is based on normal distribution. The three Sigma principles are:
- The probability of the value distribution in (μ – σ,μ+σ) is 0.6826
- The probability of the value distribution in (μ – 2σ,μ+2σ) is 0.9544
- The probability that the values are distributed in (μ — 3σ,μ+3σ) is 0.9974. Therefore, it can be considered that the values of Y are almost all concentrated in the range of (μ — 3σ,μ+3σ)], and the probability beyond this range is less than 0.3%, which is a low probability event and usually will not occur in a single experiment. Once it happens, it can be considered that there is an anomaly in the quality.
Enumerate commonly used law of large numbers and their differences
In a large number of repeated random events, there is always an almost inevitable law, which is the law of large numbers. In layman’s terms, the theorem states that the frequency of a random event approximates its probability if the experiment is repeated many times without changing. There is a certain certainty in contingency.
In a randomized trial of repeated flips of a coin, the number of heads in n flips is observed. Different n trials, the frequency of heads might be different, but as you do more and more trials, the frequency of heads is going to approach roughly 1/2. That’s the law of large numbers.
The random variable X, the mean of X is going to get closer and closer to E of X as the number of trials increases.
Central limit theorem
Assuming that a set of random variables are independent and identically distributed, when n is large enough, the distribution of the mean is close to the normal distribution
The central limit theorem works as follows: (1) When there is no way to get all the data of the population, we can use samples to estimate the population. (2) Judge whether a sample belongs to the population according to the mean value and standard deviation of the population.
The basic idea of hypothesis testing
Reductive proof with small probability. In other words, in order to test whether a hypothesis is valid, we first assume that it is valid. On the premise that the null hypothesis is valid, if unreasonable events occur, the difference between the sample and the population is significant and the null hypothesis is rejected; if no unreasonable events occur, the null hypothesis is not rejected.
The unreasonable event mentioned here refers to the low probability event. In general, we believe that a low probability event will not happen. If it happens, it is not a low probability event, so we should reject the null hypothesis.
Two types of errors in hypothesis testing
Type I error: truth-nullity, where the null hypothesis is true and we reject it. Class II error: False, null hypothesis false, but not rejected. [image upload failed…(image-f8f498-1605564475501)]
How do you balance these two types of errors?
We try to minimize the probability of making both kinds of mistakes. However, on the premise of fixed sample size, reducing the probability of making Class I errors will inevitably increase the probability of making Class II errors. Generally speaking, we always control the probability of making Class I errors first so that it is not greater than the significance level. The probability of making class II errors depends on the size of the sample, so the selection of sample size should also be considered.
Explain the significance level of the P-value
- P-value: The probability of sample observations or more extreme outcomes when the null hypothesis is true is the P-value
Distinguish between significance levels and confidence intervals
- Significance level: It is expected to reject the null hypothesis, i.e., the probability of the occurrence of a low-probability event, when the impossibility of the sample results is reached. It’s assuming what the truth value is, and then checking whether that hypothesis can be true.
- Confidence intervals, the goal is to construct an interval from the sample, and then hopefully that interval will include the truth value, but you don’t know what the truth value is. Okay?
Conditional probability
P (A | B) = P (AB)/P (B), the conditional probability P (A | B) refers to the event B occurs under the condition of the probability of event A occurs, P (AB) said event probability of A and B at the same time, P (B) is the probability of event B occurs, its evolution type can be: P (A | B) * P (B) = P (B | A) * P (A)
Total probability formula
Suppose there are two ways that event B can happen, with event A; If it does not happen together with event A, the probability of event B can be obtained by using the following formula:It can also be deduced from conditional probability:Substitute in to get:This is the total probability formula, which calculates the probability of a particular event from conditional probability.
Bayes formula
If known to the conditional probability P (B | A), then the bayesian formula is provided A method of computing the inverse of the conditional probability P (A | B) which would require the probability. First, conditional probability:I just derived thatThen substitute the full probability formula P(B), and get:
An interesting case was found in a supermarket. Suspect A has a probability of theft of 10%, and suspect B has a probability of theft of 90%. The eyewitness said that the thief was A, and the credibility of eyewitness testimony is 80%. Suspect the probability of A theft is P (A) the probability of B = 10% suspect theft is P (B) = P (A) = 90% eyewitness testimony credibility probability P (C) on the premise of A theft witnesses say theft is A probability of P (C | A) = 80% On the premise of A theft witnesses say theft is not A probability of P (C | A) = 20% now require is P (A | C) is the testimony of the witness credibility under the premise of accurate probabilities of A theft.
We require is A conditional probability P (A | C), known as A conditional probability P (C | A) just is the requirement of the conditional probability of inverse probability, here is about to use the bayesian formula. P (A | C) = P (A) P (C | A)/(P) (A) P (C | A) + P (A) P (C | A)) = 10% * 80% / 10% * 80% + 90% * 20% = 30.77%