Original link:tecdat.cn/?p=7637

Original source:Tuo End number according to the tribe public number

 

This article describes the basic Bayesian statistical analysis process using PyMC3.

Random as NPR # numpy import matplotlib.pyplot as PLT # Matplotlib is the import matplotlib as MPL for drawingCopy the code

 

Bayes formula

六四风波

 

Common statistical analysis problems

  • Parameter estimation: “Is the true value equal to X?”
  • Compare the two groups of experimental data: “Is the experimental group different from the control group? “

Problem 1: Parameter estimation

“Is the true value X?”

Or,

“Given the data, what is the probability distribution of the possible values for the parameters of interest?”

Example 1: Coin toss problem

I flipped my coin n times and got h heads. Is this coin biased?

Parameterized problem of parameter estimation

 

A priori assumptions

  • Pre-assumed distribution of parameters:P ~ Uniform (0, 1)
  • Likelihood function; likelihood functionThe data ~ Bernoulli (p)
Toss total = 30 N_heads = 11 n_tails = total - n_heads tosses = [1] * n_heads + [0] * n_tails shuffle(tosses)Copy the code

data

 

fig = plot_coins()
plt.show()    
Copy the code

 
Copy the code

MCMC Inference Button (TM) 

 

100% | █ █ █ █ █ █ █ █ █ █ | 2500/2500 [00:00 "00:00, 3382.23 it/s]Copy the code

The results of

 

pm.traceplot(coin_trace)
plt.show()
Copy the code

In [10]:

 
plt.show()
Copy the code

  • 95% highest posterior density (HPD, approximately similar to confidence interval) contains region of practical equivalence (ROPE, actual equivalent interval).

 

Example 2: Drug problems

I have a newly developed X; How good is X at preventing influenza virus replication?

The experiment

  • Test the concentration range of X to measure influenza activity
  • Calculate IC50: X concentration that can inhibit virus replication activity by 50%.

data 

 
import pandas as pd

chem_df = pd.DataFrame(chem_data)
chem_df.columns = ['concentration', 'activity']
chem_df['concentration_log'] = chem_df['concentration'].apply(lambda x:np.log10(x))
# df.set_index('concentration', inplace=True)
Copy the code

Parameterized problem

Given the uncertainty surrounding it, what is the IC50 value of the chemical?

A priori knowledge

  • Measurement function known from pharmaceutical knowledge:M = beta 1 + ex – IC50
  • Parameter estimation in the measurement function, derived from prior knowledge:Beta – HalfNormal (1002).
  • Prior knowledge of parameters of interest:The log (IC50) ~ ImproperFlat
  • likelihood function: The data ~ N (m, 1)

 

data

In [13]:

fig = plot_chemical_data(log=True)
plt.show()
Copy the code

MCMC Inference Button (TM) 

In [16]:

Pm.traceplot (IC50_trace [2000:], varnames=['IC50_log10', 'IC50']) # Real-time: an example from Step 2000. plt.show()Copy the code

The results of

In [17]:

pm.plot_posterior(ic50_trace[4000:], varnames=['IC50'], 
                  color='#87ceeb', point_estimate='mean')
plt.show()
Copy the code

The chemical has an IC50 of approximately [2 mM, 2.4 mM] (95% HPD). It’s not a good drug candidate.

The second kind of problem: comparison between experimental groups

“Is there a difference between the experimental group and the control group? “

Example 1: The effect of drugs on IQ

Does medication affect IQ scores?

def ECDF(data): x = np.sort(data) y = np.cumsum(x) / np.sum(x) return x, y def plot_drug(): Plot (x_drug, y_drug, label='drug ', FIG. Plt.figure () ax = FIG. Add_subplot (1,1,1) x_drug, y_drug = ECDF(drug) n={0}'.format(len(drug))) x_placebo, y_placebo = ECDF(placebo) ax.plot(x_placebo, y_placebo, label='placebo, N ={0}'. Format (len(placebo)) ax.legend() ax.set_xlabel('IQ Score') ax.set_ylabel('Cumulative Frequency') ax.hlines(0.5, ax.get_xlim()[0], ax.get_xlim()[1], linestyle='--') return figCopy the code

 

Out[19]:

Ttest_indResult (statistic = 2.2806701634329549, pvalue = 0.025011500508647616)Copy the code

The experiment

  • Participants were randomly divided into two groups:

    • The treatment group vs. The placebo group
  • Participants’ IQ scores were measured

A priori knowledge

  • The measured data conforms to t distribution:Data ~ StudentsT (mu, sigma, argument)

The following are several parameters of the T-distribution:

  • The mean is normally distributed:Mu ~ N (0100-2)
  • The degrees of freedom conform to the exponential distribution:Argument: Exp (30)
  • The variance is positively – distributed:Sigma ~ HalfCauchy (1002).
  •  

data

In [20]:

fig = plot_drug()
plt.show()
Copy the code

code

In [21]:

y_vals = np.concatenate([drug, placebo])
labels = ['drug'] * len(drug) + ['placebo'] * len(placebo)

data = pd.DataFrame([y_vals, labels]).T
data.columns = ['IQ', 'treatment']
Copy the code

MCMC Inference Button (TM) 

 

The results of

In [24]:

pm.traceplot(kruschke_trace[2000:], 
             varnames=['mu_drug', 'mu_placebo'])
plt.show()
Copy the code

In [25]:

pm.plot_posterior(kruschke_trace[2000:], color='#87ceeb',
            varnames=['mu_drug', 'mu_placebo', 'diff_means'])
plt.show()
Copy the code

  • The difference of IQ mean is: [0.5, 4.6]
  • Frequentism p-value:0.02(!!!!!!!!!!!!!!!!!

Note: a p-value=0.02 indicates a difference between groups, but does not say how big the difference is.

In [27]:

 
ax = adjust_forestplot_for_slides(ax)
plt.show()
Copy the code

Forest plot: 95% HPD (thin line), IQR (thick line) and median (dot) posterior distribution on the same axis, allowing us to directly compare treatment and control groups.

 
Copy the code

In [29]:

ax = pm.plot_posterior(kruschke_trace[2000:], 
                       varnames=['effect_size'],
                       color='#87ceeb')
overlay_effect_size(ax)
Copy the code

  • The medicine is probably irrelevant.
  • There is no evidence of biological significance.

Example 2: Mobile phone disinfection problem

Compare two commonly used disinfection methods and see which one is better

The experiment design

  • The phones were randomly divided into six groups: 4 “Fancy” methods + 2 “control” methods.
  • Before and after treatment, the cell phone surface swab culture
  • Count the number of colony, and compare the colony count before and after treatment

Out[30]:

sample_id                 int32
treatment                 int32
colonies_pre              int32
colonies_post             int32
morphologies_pre          int32
morphologies_post         int32
year                    float32
month                   float32
day                     float32
perc_reduction morph    float32
site                      int32
phone ID                float32
no case                 float32
frac_change_colonies    float64
dtype: object
Copy the code

data

In [32]:

fig = plot_colonies_data()
plt.show()
Copy the code

A priori knowledge

The colony count conforms to Poisson distribution.

  • Colony count conforms to Poisson distribution:Dataij ~ Poisson (mu ij), j ∈ (pre and post), I ∈ [1, 2, 3…
  • The parameters of poisson distribution are discrete uniform distribution:Mu ij ~ DiscreteUniform (0104), j ∈ (pre and post), I ∈ [1, 2, 3…
  • Sterilization effectiveness is measured by percentage change and is defined as follows:Mupre – mupostmupre

 

MCMC Inference Button (TM) 

In [34]:

with poisson_estimation:
    poisson_trace = pm.sample(200000)
Copy the code
Assigned Metropolis to pre_mus Assigned Metropolis to 100% post_mus | █ █ █ █ █ █ █ █ █ █ | 200500/200500 [01:15 "00:00, 2671.98 it/s]Copy the code

In [35]:

pm.traceplot(poisson_trace[50000:], varnames=['pre_mus', 'post_mus'])
plt.show()
Copy the code

The results of

In [39]:

pm.forestplot(poisson_trace[50000:], varnames=['perc_change'], 
              ylabels=treatment_order) #, xrange=[0, 110])
plt.xlabel('Percentage Reduction')

ax = plt.gca()
ax = adjust_forestplot_for_slides(ax)
Copy the code