Original link:tecdat.cn/?p=6488
Original source:Tuo End number according to the tribe public number
Data preparation
Mydata < -mtcars %>% select(MPG, disp, HP, drat, wt, qsec) mydata$HP [3]Copy the code
## Mazda RX4 21.0 160 110 3.90 2.62 16.5 ## Mazda RX4 Wag 21.0 160 110 3.90 2.88 17.0 ## Datsun 710 22.8 108 NA 3.85 2.32 18.6Copy the code
Computing the correlation matrix
res.cor <- correlate(mydata)
res.cor
Copy the code
## # A tibble: 6 x 7 ## rowName MPG disp HP drat wt qsec ## < CHR > < DBL > < DBL > < DBL > < DBL > < DBL > < DBL > < DBL > < DBL > < DBL > ## 1 MPG nA-0.488-0.775 0.681 -0.868 0.419 ## 2 DISp-0.848 NA 0.786-0.710 0.888-0.434 ## 3 HP-0.775 0.786 NA 0.888-0.706 ## 4 DRat 0.681 -0.710-0.443 nA-0.712 0.0912 ## 5 wt-0.868 0.888 0.651-0.712 na-0.175 ## 6 qsec 0.419-0.344-0.706 0.0912-0.175 NACopy the code
The function’s other parameters correlate() include:
method
: string indicating which correlation coefficient (or covariance) to calculate. “Pearson” (default), one of “Kendall” or “Spearman”.diagonal
: The value to which the diagonal is set (usually a number or NA).
Explore the correlation matrix
Filtering correlation higher than 0.8:
## # A tibble: 6 x 3 ## rowName colname cor ## < CHR > < CHR > < DBL > ## 1 DISp MPg-0.848 ## 2 wt mpg-0.868 ## 3 MPG DISP-0.848 ## 4 wt Disp 0.888 ## 5 MPG wt -0.868 ## 6 DISP wt 0.888Copy the code
Specific column/row
This function acts like slect() to dplyr, but also excludes selected columns from the row.
- Select relevant results. The selected column is excluded from the row:
## # A tibble: 3 x 4 ## rowName MPG disp HP ## < CHR > < DBL > < DBL > < DBL > ## 1 drat 0.61-0.710-0.443 ## 2 wt-0.868 0.888 0.651 ## 3 Qsec 0.419-0.434-0.706Copy the code
- Selected columns:
## # A tibble: 3 x 4 ## rowName MPG disp HP ## < CHR > < DBL > < DBL > < DBL > ## 1 MPG nA-0.848-0.775 ## 2 DISP-0.848NA 0.786 ## 3 HP 0.775 0.786 NACopy the code
- Delete columns that are not needed:
## # A tibble: 3 x 4 ## rowName drat wt qsec ## < CHR > < DBL > < DBL > < DBL > ## 1 MPG 0.681-0.868 0.419 ## 2 DISp-0.710 0.888-0.434 ## 3 HP - 0.443-0.651-0.706Copy the code
- Select columns by regular expression
## # A tibble: 4 x 3 ## rowName disp drat ## < CHR > < DBL > < DBL > ## 1 MPG -0.848 0.681 ## 2 HP 0.786-0.443 ## 3 wt 0.888-0.712 ## 4 Qsec - 0.434-0.0912Copy the code
- Select correlation higher than 0.8:
## # A tibble: 2 x 3 ## rowname disp wt ## < CHR > < DBL > < DBL > ## 1 disp NA 0.888 ## 2 wt 0.888 NACopy the code
- Focus on the correlation of one variable with all the others:
Extract the correlation coefficientCopy the code
## # A tibble: 5 x 2 ## rowName MPG ## < CHR > < DBL > ## 1 DISp-0.848 ## 2 HP-0.775 ## 3 DRat 0.681 ## 4 wt-0.868 ## 5 qsec 0.419Copy the code
Draw correlations between MPG and other variablesCopy the code
Reorder the correlation matrix
## # A tibble: 6 x 7 ## rowname wt drat disp MPG HP Qsec ## < CHR > < DBL > < DBL > < DBL > < DBL > < DBL > < DBL > < DBL > < DBL > ## 1 wt nA-0.712 0.88-0.868 0.751-0.175 ## 2 drat-0.712 na-0.710 0.681-0.443 0.0912 ## 3 DISp 0.888-0.710 na-0.748 0.786-0.434 ## 4 MPG Na-0.775 0.419 ## 6 qsec-0.175 0.912-0.434 0.419 ## 6 qsec-0.175 0.912-0.434 0.419 ## 6 qsec-0.175 0.912-0.434 0.419 0.706 NACopy the code
Upper/lower triangle
Up/down triangles to missing values
res.cor %>% shave()
Copy the code
## # A tibble: 6 x 7 ## rowname mpg disp hp drat wt qsec ## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 mpg NA NA NA NA NA NA ## 2 Disp-0.848 NA NA NA NA ## 3 hP-0.775 0.786 NA NA NA ## 4 drat 0.681-0.710-0.443 NA NA ## 5 wt-0.868 0.888 Qsec 0.419-0.344-0.706 0.0912-0.175na NA ## 6 qsec 0.419-0.344-0.706 0.0912-0.175naCopy the code
Stretch the data into a long format
res.cor %>% stretch()
Copy the code
## # A tibble: 36 x 3 ## x y r ## < CHR > < CHR > < DBL > ## 1 MPG MPG NA ## 2 MPG disp-0.848 ## 3 MPG hP-0.775 ## 4 MPG drat 0.681 ## 5 MPG wt-0.868 ## 6 MPG qsec 0.419 ## #... with 30 more rowsCopy the code
Use Tidyverse and CORRR packages to handle correlations
Distribution of visual correlation coefficients:
Rearrange and filter the correlation matrix:
res.cor %>%
focus(mpg:drat, mirror = TRUE) %>%
Copy the code
## # A tibble: 3 x 4 ## rowName MPG disp drat ## < CHR > < DBL > < DBL > < DBL > ## 1 HP-0.775 0.786-0.443 ## 2 MPG nA-0.848 0.681 ## 3 DISp NA NA - 0.710Copy the code
Explanatory correlation
## rowname mpg disp hp drat wt qsec
## 1 mpg -.85 -.77 .68 -.87 .42
## 2 disp -.85 .79 -.71 .89 -.43
## 3 hp -.77 .79 -.44 .65 -.71
## 4 drat .68 -.71 -.44 -.71 .09
## 5 wt -.87 .89 .65 -.71 -.17
## 6 qsec .42 -.43 -.71 .09 -.17
Copy the code
res.cor %>%
focus(mpg:drat, mirror = TRUE)
Copy the code
## rowname mpg disp drat
## 1 hp -.77 .79 -.44
## 2 mpg -.85 .68
## 3 disp -.71
Copy the code
- Make relevant drawings:
- Rearrange and draw the following triangle:
Copy the code
- Production network
Copy the code
Associate data in a database
- Using SQLite database:
con <- DBI::dbConnect(RSQLite::SQLite(), path = ":dbname:")
db_mtcars <- copy_to(con, mtcars)
class(db_mtcars)
Copy the code
Correlate () tested the database back end, used by Tidyeval to calculate correlations in the database and return related data.
db_mtcars %>% correlate(use = "complete.obs")
Copy the code
- The spark of use:
sc <- sparklyr::spark_connect(master = "local")
mtcars_tbl <- copy_to(sc, mtcars)
correlate(mtcars_tbl, use = "complete.obs")
Copy the code
-
Thank you so much for reading this article, and leave a comment below if you have any questions!
reference
1. Dynamic map visualization in R language: how to create beautifully animated graphs
2. Visual analysis of R language survival analysis
3.Python Data Visualization – Seaborn Iris Iris data
4. R language for buffon needle throwing (Buffon needle throwing) experiment simulation and dynamic
5. Visualization case of R language survival analysis data analysis
6. R language data visualization analysis case: Explore BRFSS data data analysis
7. Dynamic visualization in R language: make animated GIF video images of cumulative dynamic line charts of historical global average temperature
8. Case report of principal component Pca and T-SNE algorithm dimension reduction and visual analysis for R language high-dimensional data
9. Python topics LDA modeling and T-SNE visualization