Original link:tecdat.cn/?p=22410 

The originalReference:Tuo End number according to the tribe public number

 

The purpose of this paper is to complete a logistic regression analysis. Gives you a basic idea of the analytical steps and thought processes.

library(tidyverse)
library(broom)
Copy the code

The data come from an ongoing cardiovascular study of townspeople. The aim is to predict whether a patient will have coronary heart disease risk over the next 10 years. This data set includes the following.

  • Male: 0= female; 1 = men
  • Age.
  • Education. 1 = high school or below; 2 = high school; University or vocational school; 4 = University or above
  • Current smoking. 0= do not smoke; 1 = smokers
  • CigsPerDay: Number of cigarettes smoked per day (estimated average).
  • BPMeds: 0 = Do not take antihypertensive drugs; 1 = Taking blood pressure medication
  • A stroke. 0 = no family history of stroke; 1 = a family history of stroke
  • hypertension 0 = no prevalence of hypertension in family history; 1 = High blood pressure is prevalent in the family
  • Diabetes: 0 = none; 1 = a
  • Total cholesterol (mgdL)
  • SysBP: Systolic pressure (mmHg)
  • DiaBP: Diastolic blood pressure (mmHg)
  • BMI: Body mass index
  • Heart rate
  • Glucose: Total glucose mgdL
  • TenYearCHD: 0 = Patients have no risk of coronary heart disease in the next 10 years; 1 = The patient has a 10-year risk of coronary heart disease

Load and prepare the data

read_csv("framingham.csv") %>%
  drop_na() %>% Delete observed values with missing values
  ageCent = age - mean(age), 
  totCholCent = totChol - mean(totChol), 
Copy the code

Fitting logistic regression model

glm(TenYearCHD ~ age +  Smoker +  CholCent, 
              data = data, family = binomial)
Copy the code

To predict

For new patients

 data_frame(ageCent = (60 - 49.552), 
                 totCholCent = (263 - 236.848), 
Copy the code

Predictive logarithmic probability

predict(risk_m, x0) 
Copy the code

 

The forecast probability

Based on this probability, do you think this patient has a high risk of coronary heart disease in the next 10 years? Why is that?

 


risk
Copy the code

Confusion matrix


risk_m %>%
  group_by(TenYearCHD, risk_predict) %>%
  kable(format="markdown")
Copy the code

 

mutate( predict = if_else(.fitted > threshold, "1: Yes"."0: No")) 
Copy the code

 

What percentage of observations are misclassified? What are the disadvantages of relying on obfuscation matrices to evaluate model accuracy?

The ROC curve

ggplot(risk_m_aug, 
  oc(n.cuts = 10, labelround = 3) + 
  geom_abline(intercept = 0) + 
 
Copy the code

auc(roc )$AUC
Copy the code

 

A doctor plans to use the results of your model to help select patients for a new heart disease prevention program. She asked you which threshold would be best for selecting patients for the program. What threshold would you recommend to your doctor based on the ROC curve? Why is that?

Assuming that

Why don’t we plot the original residuals?

ggplot(data = risk aes(x = .fitted, y = .resid)) +
  labs(x = "Predicted value", y = "Original residual")
Copy the code

Residual diagram of grading

 plot(x =  fitted, y =  resid,
                xlab = "Predicted probability", 
                main = "Comparison of residual and predicted values after classification".Copy the code

## # currentcurrentcurrentmean_resid ## < FCT > < DBL > ## 1 0-2.504E-14 ## 2 1-2.504E-14Copy the code

Check the hypothesis:

– linear? – Random? – Independence?

Coefficient inference

How are currentSmoker1 test statistics calculated? Is totalCholCent a statistically significant predictor of a person’s high risk of coronary heart disease? Use test statistics and p-values to prove your answer. Use confidence intervals to state your answer.

 

Deviation test

 glm(TenYearCHD ~ ageCent + currentSmoker + totChol, 
              data = heart_data, family = binomial)
 
Copy the code

anova 
Copy the code

AIC

Based on the deviation test, which model would you choose? Which model would you choose based on AIC?

Use the * *Stepwise regression** Select the model

step(full_model )
Copy the code

 

  kable(format = "markdown" )
Copy the code


Most welcome insight

1.R language multiple Logistic Logistic regression application case

2. Panel smooth transfer regression (PSTR) analysis case implementation

3. Partial least squares regression (PLSR) and principal component regression (PCR) in MATLAB

4.R language Poisson regression model analysis cases

5. Hosmer-lemeshow goodness of fit test in R language regression

6. Implementation of LASSO regression, Ridge regression and Elastic Net model in R language

7. Realize Logistic Logistic regression in R language

8. Python predicts stock prices using linear regression

9. How to calculate IDI and NRI indices for R language in survival analysis and Cox regression