Exercises


The following packages are required for this practical:

library(dplyr)
library(magrittr)
library(mice)

and if you’d like the same results as I have obtained, you can fix the random seed

set.seed(123)

  1. Use a pipe to do the following:
rnorm(1000, 5) %>%
  matrix(ncol = 2) %>%
  plot()


  1. Use a pipe to calculate the correlation matrix on the anscombe data set
anscombe %>%
  cor()
##            x1         x2         x3         x4         y1         y2         y3
## x1  1.0000000  1.0000000  1.0000000 -0.5000000  0.8164205  0.8162365  0.8162867
## x2  1.0000000  1.0000000  1.0000000 -0.5000000  0.8164205  0.8162365  0.8162867
## x3  1.0000000  1.0000000  1.0000000 -0.5000000  0.8164205  0.8162365  0.8162867
## x4 -0.5000000 -0.5000000 -0.5000000  1.0000000 -0.5290927 -0.7184365 -0.3446610
## y1  0.8164205  0.8164205  0.8164205 -0.5290927  1.0000000  0.7500054  0.4687167
## y2  0.8162365  0.8162365  0.8162365 -0.7184365  0.7500054  1.0000000  0.5879193
## y3  0.8162867  0.8162867  0.8162867 -0.3446610  0.4687167  0.5879193  1.0000000
## y4 -0.3140467 -0.3140467 -0.3140467  0.8165214 -0.4891162 -0.4780949 -0.1554718
##            y4
## x1 -0.3140467
## x2 -0.3140467
## x3 -0.3140467
## x4  0.8165214
## y1 -0.4891162
## y2 -0.4780949
## y3 -0.1554718
## y4  1.0000000

  1. Now use a pipe to calculate the correlation for the pair (x4, y4) on the anscombe data set

Using the standard %>% pipe:

anscombe %>%
  subset(select = c(x4, y4)) %>%
  cor()
##           x4        y4
## x4 1.0000000 0.8165214
## y4 0.8165214 1.0000000

Alternatively, we can use the %$% pipe from package magrittr to make this process much more efficient.

anscombe %$%
  cor(x4, y4)
## [1] 0.8165214

  1. Use a pipe to calculate the correlation between hgt and wgt in the boys data set from package mice.

Because boys has missing values for almost all variables, we must first select wgt and hgt and then omit the rows that have missing values, before we can calculate the correlation. Using the standard %>% pipe, this would look like:

boys %>%
  subset(select = c("wgt", "hgt")) %>%
  cor(use = "pairwise.complete.obs")
##           wgt       hgt
## wgt 1.0000000 0.9428906
## hgt 0.9428906 1.0000000

which is equivalent to

boys %>%
  subset(select = c("wgt", "hgt")) %>%
  na.omit() %>%
  cor()
##           wgt       hgt
## wgt 1.0000000 0.9428906
## hgt 0.9428906 1.0000000

Alternatively, we can use the %$% pipe:

boys %$% 
  cor(hgt, wgt, use = "pairwise.complete.obs")
## [1] 0.9428906

The %$% pipe unfolds the listed dimensions of the boys data set, such that we can refer to them directly.


  1. In the boys data set, hgt is recorded in centimeters. Use a pipe to transform hgt in the boys dataset to height in meters and verify the transformation

Using the standard %>% and the %$% pipes:

boys %>%
  transform(hgt = hgt / 100) %$%
  mean(hgt, na.rm = TRUE)
## [1] 1.321518

End of Practical


Useful References