Original link:tecdat.cn/?p=22410
The originalReference:Tuo End number according to the tribe public number
The purpose of this paper is to complete a logistic regression analysis. Gives you a basic idea of the analytical steps and thought processes.
library(tidyverse)
library(broom)
Copy the code
The data come from an ongoing cardiovascular study of townspeople. The aim is to predict whether a patient will have coronary heart disease risk over the next 10 years. This data set includes the following.
- Male: 0= female; 1 = men
- Age.
- Education. 1 = high school or below; 2 = high school; University or vocational school; 4 = University or above
- Current smoking. 0= do not smoke; 1 = smokers
- CigsPerDay: Number of cigarettes smoked per day (estimated average).
- BPMeds: 0 = Do not take antihypertensive drugs; 1 = Taking blood pressure medication
- A stroke. 0 = no family history of stroke; 1 = a family history of stroke
- hypertension 0 = no prevalence of hypertension in family history; 1 = High blood pressure is prevalent in the family
- Diabetes: 0 = none; 1 = a
- Total cholesterol (mgdL)
- SysBP: Systolic pressure (mmHg)
- DiaBP: Diastolic blood pressure (mmHg)
- BMI: Body mass index
- Heart rate
- Glucose: Total glucose mgdL
- TenYearCHD: 0 = Patients have no risk of coronary heart disease in the next 10 years; 1 = The patient has a 10-year risk of coronary heart disease
Load and prepare the data
read_csv("framingham.csv") %>%
drop_na() %>% Delete observed values with missing values
ageCent = age - mean(age),
totCholCent = totChol - mean(totChol),
Copy the code
Fitting logistic regression model
glm(TenYearCHD ~ age + Smoker + CholCent,
data = data, family = binomial)
Copy the code
To predict
For new patients
data_frame(ageCent = (60 - 49.552),
totCholCent = (263 - 236.848),
Copy the code
Predictive logarithmic probability
predict(risk_m, x0)
Copy the code
The forecast probability
Based on this probability, do you think this patient has a high risk of coronary heart disease in the next 10 years? Why is that?
risk
Copy the code
Confusion matrix
risk_m %>%
group_by(TenYearCHD, risk_predict) %>%
kable(format="markdown")
Copy the code
mutate( predict = if_else(.fitted > threshold, "1: Yes"."0: No"))
Copy the code
What percentage of observations are misclassified? What are the disadvantages of relying on obfuscation matrices to evaluate model accuracy?
The ROC curve
ggplot(risk_m_aug,
oc(n.cuts = 10, labelround = 3) +
geom_abline(intercept = 0) +
Copy the code
auc(roc )$AUC
Copy the code
A doctor plans to use the results of your model to help select patients for a new heart disease prevention program. She asked you which threshold would be best for selecting patients for the program. What threshold would you recommend to your doctor based on the ROC curve? Why is that?
Assuming that
Why don’t we plot the original residuals?
ggplot(data = risk aes(x = .fitted, y = .resid)) +
labs(x = "Predicted value", y = "Original residual")
Copy the code
Residual diagram of grading
plot(x = fitted, y = resid,
xlab = "Predicted probability",
main = "Comparison of residual and predicted values after classification".Copy the code
## # currentcurrentcurrentmean_resid ## < FCT > < DBL > ## 1 0-2.504E-14 ## 2 1-2.504E-14Copy the code
Check the hypothesis:
– linear? – Random? – Independence?
Coefficient inference
How are currentSmoker1 test statistics calculated? Is totalCholCent a statistically significant predictor of a person’s high risk of coronary heart disease? Use test statistics and p-values to prove your answer. Use confidence intervals to state your answer.
Deviation test
glm(TenYearCHD ~ ageCent + currentSmoker + totChol,
data = heart_data, family = binomial)
Copy the code
anova
Copy the code
AIC
Based on the deviation test, which model would you choose? Which model would you choose based on AIC?
Use the * *Stepwise regression
** Select the model
step(full_model )
Copy the code
kable(format = "markdown" )
Copy the code
Most welcome insight
1.R language multiple Logistic Logistic regression application case
2. Panel smooth transfer regression (PSTR) analysis case implementation
3. Partial least squares regression (PLSR) and principal component regression (PCR) in MATLAB
4.R language Poisson regression model analysis cases
5. Hosmer-lemeshow goodness of fit test in R language regression
6. Implementation of LASSO regression, Ridge regression and Elastic Net model in R language
7. Realize Logistic Logistic regression in R language
8. Python predicts stock prices using linear regression
9. How to calculate IDI and NRI indices for R language in survival analysis and Cox regression