Original link:tecdat.cn/?p=7637
Original source:Tuo End number according to the tribe public number
This article describes the basic Bayesian statistical analysis process using PyMC3.
Random as NPR # numpy import matplotlib.pyplot as PLT # Matplotlib is the import matplotlib as MPL for drawingCopy the code
Bayes formula
六四风波
Common statistical analysis problems
- Parameter estimation: “Is the true value equal to X?”
- Compare the two groups of experimental data: “Is the experimental group different from the control group? “
Problem 1: Parameter estimation
“Is the true value X?”
Or,
“Given the data, what is the probability distribution of the possible values for the parameters of interest?”
Example 1: Coin toss problem
I flipped my coin n times and got h heads. Is this coin biased?
Parameterized problem of parameter estimation
A priori assumptions
- Pre-assumed distribution of parameters:P ~ Uniform (0, 1)
- Likelihood function; likelihood functionThe data ~ Bernoulli (p)
Toss total = 30 N_heads = 11 n_tails = total - n_heads tosses = [1] * n_heads + [0] * n_tails shuffle(tosses)Copy the code
data
fig = plot_coins()
plt.show()
Copy the code
Copy the code
MCMC Inference Button (TM)
100% | █ █ █ █ █ █ █ █ █ █ | 2500/2500 [00:00 "00:00, 3382.23 it/s]Copy the code
The results of
pm.traceplot(coin_trace)
plt.show()
Copy the code
In [10]:
plt.show()
Copy the code
- 95% highest posterior density (HPD, approximately similar to confidence interval) contains region of practical equivalence (ROPE, actual equivalent interval).
Example 2: Drug problems
I have a newly developed X; How good is X at preventing influenza virus replication?
The experiment
- Test the concentration range of X to measure influenza activity
- Calculate IC50: X concentration that can inhibit virus replication activity by 50%.
data
import pandas as pd
chem_df = pd.DataFrame(chem_data)
chem_df.columns = ['concentration', 'activity']
chem_df['concentration_log'] = chem_df['concentration'].apply(lambda x:np.log10(x))
# df.set_index('concentration', inplace=True)
Copy the code
Parameterized problem
Given the uncertainty surrounding it, what is the IC50 value of the chemical?
A priori knowledge
- Measurement function known from pharmaceutical knowledge:M = beta 1 + ex – IC50
- Parameter estimation in the measurement function, derived from prior knowledge:Beta – HalfNormal (1002).
- Prior knowledge of parameters of interest:The log (IC50) ~ ImproperFlat
- likelihood function: The data ~ N (m, 1)
data
In [13]:
fig = plot_chemical_data(log=True)
plt.show()
Copy the code
MCMC Inference Button (TM)
In [16]:
Pm.traceplot (IC50_trace [2000:], varnames=['IC50_log10', 'IC50']) # Real-time: an example from Step 2000. plt.show()Copy the code
The results of
In [17]:
pm.plot_posterior(ic50_trace[4000:], varnames=['IC50'],
color='#87ceeb', point_estimate='mean')
plt.show()
Copy the code
The chemical has an IC50 of approximately [2 mM, 2.4 mM] (95% HPD). It’s not a good drug candidate.
The second kind of problem: comparison between experimental groups
“Is there a difference between the experimental group and the control group? “
Example 1: The effect of drugs on IQ
Does medication affect IQ scores?
def ECDF(data): x = np.sort(data) y = np.cumsum(x) / np.sum(x) return x, y def plot_drug(): Plot (x_drug, y_drug, label='drug ', FIG. Plt.figure () ax = FIG. Add_subplot (1,1,1) x_drug, y_drug = ECDF(drug) n={0}'.format(len(drug))) x_placebo, y_placebo = ECDF(placebo) ax.plot(x_placebo, y_placebo, label='placebo, N ={0}'. Format (len(placebo)) ax.legend() ax.set_xlabel('IQ Score') ax.set_ylabel('Cumulative Frequency') ax.hlines(0.5, ax.get_xlim()[0], ax.get_xlim()[1], linestyle='--') return figCopy the code
Out[19]:
Ttest_indResult (statistic = 2.2806701634329549, pvalue = 0.025011500508647616)Copy the code
The experiment
-
Participants were randomly divided into two groups:
The treatment group
vs.The placebo group
-
Participants’ IQ scores were measured
A priori knowledge
- The measured data conforms to t distribution:Data ~ StudentsT (mu, sigma, argument)
The following are several parameters of the T-distribution:
- The mean is normally distributed:Mu ~ N (0100-2)
- The degrees of freedom conform to the exponential distribution:Argument: Exp (30)
- The variance is positively – distributed:Sigma ~ HalfCauchy (1002).
data
In [20]:
fig = plot_drug()
plt.show()
Copy the code
code
In [21]:
y_vals = np.concatenate([drug, placebo])
labels = ['drug'] * len(drug) + ['placebo'] * len(placebo)
data = pd.DataFrame([y_vals, labels]).T
data.columns = ['IQ', 'treatment']
Copy the code
MCMC Inference Button (TM)
The results of
In [24]:
pm.traceplot(kruschke_trace[2000:],
varnames=['mu_drug', 'mu_placebo'])
plt.show()
Copy the code
In [25]:
pm.plot_posterior(kruschke_trace[2000:], color='#87ceeb',
varnames=['mu_drug', 'mu_placebo', 'diff_means'])
plt.show()
Copy the code
- The difference of IQ mean is: [0.5, 4.6]
- Frequentism p-value:0.02(!!!!!!!!!!!!!!!!!
Note: a p-value=0.02 indicates a difference between groups, but does not say how big the difference is.
In [27]:
ax = adjust_forestplot_for_slides(ax)
plt.show()
Copy the code
Forest plot: 95% HPD (thin line), IQR (thick line) and median (dot) posterior distribution on the same axis, allowing us to directly compare treatment and control groups.
Copy the code
In [29]:
ax = pm.plot_posterior(kruschke_trace[2000:],
varnames=['effect_size'],
color='#87ceeb')
overlay_effect_size(ax)
Copy the code
- The medicine is probably irrelevant.
- There is no evidence of biological significance.
Example 2: Mobile phone disinfection problem
Compare two commonly used disinfection methods and see which one is better
The experiment design
- The phones were randomly divided into six groups: 4 “Fancy” methods + 2 “control” methods.
- Before and after treatment, the cell phone surface swab culture
- Count the number of colony, and compare the colony count before and after treatment
Out[30]:
sample_id int32
treatment int32
colonies_pre int32
colonies_post int32
morphologies_pre int32
morphologies_post int32
year float32
month float32
day float32
perc_reduction morph float32
site int32
phone ID float32
no case float32
frac_change_colonies float64
dtype: object
Copy the code
data
In [32]:
fig = plot_colonies_data()
plt.show()
Copy the code
A priori knowledge
The colony count conforms to Poisson distribution.
- Colony count conforms to Poisson distribution:Dataij ~ Poisson (mu ij), j ∈ (pre and post), I ∈ [1, 2, 3…
- The parameters of poisson distribution are discrete uniform distribution:Mu ij ~ DiscreteUniform (0104), j ∈ (pre and post), I ∈ [1, 2, 3…
- Sterilization effectiveness is measured by percentage change and is defined as follows:Mupre – mupostmupre
MCMC Inference Button (TM)
In [34]:
with poisson_estimation:
poisson_trace = pm.sample(200000)
Copy the code
Assigned Metropolis to pre_mus Assigned Metropolis to 100% post_mus | █ █ █ █ █ █ █ █ █ █ | 200500/200500 [01:15 "00:00, 2671.98 it/s]Copy the code
In [35]:
pm.traceplot(poisson_trace[50000:], varnames=['pre_mus', 'post_mus'])
plt.show()
Copy the code
The results of
In [39]:
pm.forestplot(poisson_trace[50000:], varnames=['perc_change'],
ylabels=treatment_order) #, xrange=[0, 110])
plt.xlabel('Percentage Reduction')
ax = plt.gca()
ax = adjust_forestplot_for_slides(ax)
Copy the code