The following packages are required for this practical:
library(dplyr)
library(magrittr)
library(mice)
and if you’d like the same results as I have obtained, you can fix the random seed
set.seed(123)
mean = 5
and sd = 1
- \(N(5, 1)\),rnorm(1000, 5) %>%
matrix(ncol = 2) %>%
plot()
anscombe
data setanscombe %>%
cor()
## x1 x2 x3 x4 y1 y2 y3
## x1 1.0000000 1.0000000 1.0000000 -0.5000000 0.8164205 0.8162365 0.8162867
## x2 1.0000000 1.0000000 1.0000000 -0.5000000 0.8164205 0.8162365 0.8162867
## x3 1.0000000 1.0000000 1.0000000 -0.5000000 0.8164205 0.8162365 0.8162867
## x4 -0.5000000 -0.5000000 -0.5000000 1.0000000 -0.5290927 -0.7184365 -0.3446610
## y1 0.8164205 0.8164205 0.8164205 -0.5290927 1.0000000 0.7500054 0.4687167
## y2 0.8162365 0.8162365 0.8162365 -0.7184365 0.7500054 1.0000000 0.5879193
## y3 0.8162867 0.8162867 0.8162867 -0.3446610 0.4687167 0.5879193 1.0000000
## y4 -0.3140467 -0.3140467 -0.3140467 0.8165214 -0.4891162 -0.4780949 -0.1554718
## y4
## x1 -0.3140467
## x2 -0.3140467
## x3 -0.3140467
## x4 0.8165214
## y1 -0.4891162
## y2 -0.4780949
## y3 -0.1554718
## y4 1.0000000
x4
, y4
) on the anscombe
data
setUsing the standard %>%
pipe:
anscombe %>%
subset(select = c(x4, y4)) %>%
cor()
## x4 y4
## x4 1.0000000 0.8165214
## y4 0.8165214 1.0000000
Alternatively, we can use the %$%
pipe from package
magrittr
to make this process much more efficient.
anscombe %$%
cor(x4, y4)
## [1] 0.8165214
hgt
and wgt
in the boys
data set
from package mice
.Because boys
has missing values for almost all
variables, we must first select wgt
and hgt
and then omit the rows that have missing values, before we can calculate
the correlation. Using the standard %>%
pipe, this would
look like:
boys %>%
subset(select = c("wgt", "hgt")) %>%
cor(use = "pairwise.complete.obs")
## wgt hgt
## wgt 1.0000000 0.9428906
## hgt 0.9428906 1.0000000
which is equivalent to
boys %>%
subset(select = c("wgt", "hgt")) %>%
na.omit() %>%
cor()
## wgt hgt
## wgt 1.0000000 0.9428906
## hgt 0.9428906 1.0000000
Alternatively, we can use the %$%
pipe:
boys %$%
cor(hgt, wgt, use = "pairwise.complete.obs")
## [1] 0.9428906
The %$%
pipe unfolds the listed dimensions of
the boys
data set, such that we can refer to them
directly.
boys
data set, hgt
is
recorded in centimeters. Use a pipe to transform hgt
in the
boys
dataset to height in meters and verify the
transformationUsing the standard %>%
and the %$%
pipes:
boys %>%
transform(hgt = hgt / 100) %$%
mean(hgt, na.rm = TRUE)
## [1] 1.321518
End of Practical