The following packages are required for this practical:
library(dplyr)
library(magrittr)
library(mice)
and if you’d like the same results as I have obtained, you can fix the random seed
set.seed(123)
mean = 5 and sd = 1 - \(N(5, 1)\),rnorm(1000, 5) %>%
matrix(ncol = 2) %>%
plot()
anscombe data setanscombe %>%
cor()
## x1 x2 x3 x4 y1 y2 y3
## x1 1.0000000 1.0000000 1.0000000 -0.5000000 0.8164205 0.8162365 0.8162867
## x2 1.0000000 1.0000000 1.0000000 -0.5000000 0.8164205 0.8162365 0.8162867
## x3 1.0000000 1.0000000 1.0000000 -0.5000000 0.8164205 0.8162365 0.8162867
## x4 -0.5000000 -0.5000000 -0.5000000 1.0000000 -0.5290927 -0.7184365 -0.3446610
## y1 0.8164205 0.8164205 0.8164205 -0.5290927 1.0000000 0.7500054 0.4687167
## y2 0.8162365 0.8162365 0.8162365 -0.7184365 0.7500054 1.0000000 0.5879193
## y3 0.8162867 0.8162867 0.8162867 -0.3446610 0.4687167 0.5879193 1.0000000
## y4 -0.3140467 -0.3140467 -0.3140467 0.8165214 -0.4891162 -0.4780949 -0.1554718
## y4
## x1 -0.3140467
## x2 -0.3140467
## x3 -0.3140467
## x4 0.8165214
## y1 -0.4891162
## y2 -0.4780949
## y3 -0.1554718
## y4 1.0000000
x4, y4) on the anscombe data
setUsing the standard %>% pipe:
anscombe %>%
subset(select = c(x4, y4)) %>%
cor()
## x4 y4
## x4 1.0000000 0.8165214
## y4 0.8165214 1.0000000
Alternatively, we can use the %$% pipe from package
magrittr to make this process much more efficient.
anscombe %$%
cor(x4, y4)
## [1] 0.8165214
hgt and wgt in the boys data set
from package mice.Because boys has missing values for almost all
variables, we must first select wgt and hgt
and then omit the rows that have missing values, before we can calculate
the correlation. Using the standard %>% pipe, this would
look like:
boys %>%
subset(select = c("wgt", "hgt")) %>%
cor(use = "pairwise.complete.obs")
## wgt hgt
## wgt 1.0000000 0.9428906
## hgt 0.9428906 1.0000000
which is equivalent to
boys %>%
subset(select = c("wgt", "hgt")) %>%
na.omit() %>%
cor()
## wgt hgt
## wgt 1.0000000 0.9428906
## hgt 0.9428906 1.0000000
Alternatively, we can use the %$% pipe:
boys %$%
cor(hgt, wgt, use = "pairwise.complete.obs")
## [1] 0.9428906
The %$% pipe unfolds the listed dimensions of
the boys data set, such that we can refer to them
directly.
boys data set, hgt is
recorded in centimeters. Use a pipe to transform hgt in the
boys dataset to height in meters and verify the
transformationUsing the standard %>% and the %$%
pipes:
boys %>%
transform(hgt = hgt / 100) %$%
mean(hgt, na.rm = TRUE)
## [1] 1.321518
End of Practical