Practical E: Solutions

Exercises

Create a for-loop that loops over all numbers between 0 and 10, but only prints numbers below 5.

for (i in 0:10) {
  if (i < 5) {
    print(i)
  }
}

## [1] 0
## [1] 1
## [1] 2
## [1] 3
## [1] 4

Modify the for loop to only print the numbers 3, 4, and 5.

for (i in 0:10) {
  if (i >= 3 & i <= 5) {
    print(i)
  }
}

## [1] 3
## [1] 4
## [1] 5

Or, even more efficiently:

for (i in 0:10) {
  if (i %in% 3:5) {
    print(i)
  }
}

## [1] 3
## [1] 4
## [1] 5

Try to do the same thing without a for-loop, by subsetting a vector from 0 to 10 directly.

num <- 0:10
num[num >= 3 & num <=5]

## [1] 3 4 5

or, alternatively,

subset(num, num >= 3 & num <=5)

## [1] 3 4 5

Recreate the following matrix, where 1 to eight are multiplied by 1 on the first row, 2 on the second, etc. Tip: use byrow = TRUE to fill a matrix left-to-right instead of top-to-bottom.

##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
## [1,]    1    2    3    4    5    6    7    8
## [2,]    2    4    6    8   10   12   14   16
## [3,]    3    6    9   12   15   18   21   24
## [4,]    4    8   12   16   20   24   28   32
## [5,]    5   10   15   20   25   30   35   40

# Create a matrix with 1 to 8. 
mat <- matrix(1:8, ncol=8, nrow=5, byrow = TRUE)

# Loop over each row, and multiply it. 
for (i in 1:5) {
  mat[i, ] <- mat[i, ] * i
}

mat

##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
## [1,]    1    2    3    4    5    6    7    8
## [2,]    2    4    6    8   10   12   14   16
## [3,]    3    6    9   12   15   18   21   24
## [4,]    4    8   12   16   20   24   28   32
## [5,]    5   10   15   20   25   30   35   40

Create a 6 by 6 matrix of strings, where each cell contains “row + column = sum”. For example, the second row, third column would yield “2 + 3 = 6”. Tip: Create an empty 6x6 matrix first and fill it with values later.

string.mat <- matrix(NA, ncol = 6, nrow = 6)

for (i in 1:6) {
  for (j in 1:6) {
    string.mat[i, j] <- paste(i, "+", j, "=", i+j, sep="")
  }
}

string.mat

##      [,1]    [,2]    [,3]    [,4]     [,5]     [,6]    
## [1,] "1+1=2" "1+2=3" "1+3=4" "1+4=5"  "1+5=6"  "1+6=7" 
## [2,] "2+1=3" "2+2=4" "2+3=5" "2+4=6"  "2+5=7"  "2+6=8" 
## [3,] "3+1=4" "3+2=5" "3+3=6" "3+4=7"  "3+5=8"  "3+6=9" 
## [4,] "4+1=5" "4+2=6" "4+3=7" "4+4=8"  "4+5=9"  "4+6=10"
## [5,] "5+1=6" "5+2=7" "5+3=8" "5+4=9"  "5+5=10" "5+6=11"
## [6,] "6+1=7" "6+2=8" "6+3=9" "6+4=10" "6+5=11" "6+6=12"

Modify your loop to put "Sum > 8" in the matrix in the cells where that is true.

string.mat <- matrix(NA, ncol = 6, nrow = 6)

for (i in 1:6) {
  for (j in 1:6) {
    if (i+j <= 8) {
      string.mat[i, j] <- paste(i, "+", j, "=", i+j, sep="")
    } else {
      string.mat[i, j] <- "Sum > 8"
    }
  }
}

string.mat

##      [,1]    [,2]    [,3]      [,4]      [,5]      [,6]     
## [1,] "1+1=2" "1+2=3" "1+3=4"   "1+4=5"   "1+5=6"   "1+6=7"  
## [2,] "2+1=3" "2+2=4" "2+3=5"   "2+4=6"   "2+5=7"   "2+6=8"  
## [3,] "3+1=4" "3+2=5" "3+3=6"   "3+4=7"   "3+5=8"   "Sum > 8"
## [4,] "4+1=5" "4+2=6" "4+3=7"   "4+4=8"   "Sum > 8" "Sum > 8"
## [5,] "5+1=6" "5+2=7" "5+3=8"   "Sum > 8" "Sum > 8" "Sum > 8"
## [6,] "6+1=7" "6+2=8" "Sum > 8" "Sum > 8" "Sum > 8" "Sum > 8"

The anscombe data set is a wonderful data set from 1973 by Francis J. Anscombe aimed to demonstrate that pairs of variables can have the same statistical properties, while having completely differnt graphical representations. We will be using this data set more this week. If you’d like to know more about anscombe, you can simply call ?anscombe to enter the help.

You can directly call anscombe from your console because the datasets package is a base package in R. This means that it is always included and loaded when you start an R instance. In general, when you would like to access functions or data sets from packages that are not automatically loaded, we don’t have to explicitly load the package. We can also call package::thing-we-need to directly ‘grab’ the thing-we-need from the package namespace. For example,

test <- datasets::anscombe
identical(test, anscombe) #test if identical

## [1] TRUE

This is especially handy within functions, as we can call package::function-name to borrow functionality from installed packages, without loading the whole package. Calling only those functions that you need is more memory-efficient than loading it all. More memory efficient means faster computation.

Display summary statistics (for example, using summary) of each column of the anscombe dataset from the datasets package

# Using i as an indicator for the current column.
for (i in 1:ncol(anscombe)) {
  print(colnames(anscombe)[i])
  print(summary(anscombe[, i]))
}

## [1] "x1"
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     4.0     6.5     9.0     9.0    11.5    14.0 
## [1] "x2"
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     4.0     6.5     9.0     9.0    11.5    14.0 
## [1] "x3"
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     4.0     6.5     9.0     9.0    11.5    14.0 
## [1] "x4"
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       8       8       8       9       8      19 
## [1] "y1"
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   4.260   6.315   7.580   7.501   8.570  10.840 
## [1] "y2"
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   3.100   6.695   8.140   7.501   8.950   9.260 
## [1] "y3"
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    5.39    6.25    7.11    7.50    7.98   12.74 
## [1] "y4"
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   5.250   6.170   7.040   7.501   8.190  12.500

# Looping over the variables directly. 
# Although the code is a bit more clear, this does mean that we can not access the names of the variables.
# So the output is less clear. 
for (i in anscombe) {
  print(summary(i))
}

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     4.0     6.5     9.0     9.0    11.5    14.0 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     4.0     6.5     9.0     9.0    11.5    14.0 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     4.0     6.5     9.0     9.0    11.5    14.0 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       8       8       8       9       8      19 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   4.260   6.315   7.580   7.501   8.570  10.840 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   3.100   6.695   8.140   7.501   8.950   9.260 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    5.39    6.25    7.11    7.50    7.98   12.74 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   5.250   6.170   7.040   7.501   8.190  12.500

Display summary statistics of each column of the anscombe dataset using apply().

apply(X = anscombe, MARGIN = 2, FUN = summary)

##           x1   x2   x3 x4        y1       y2    y3        y4
## Min.     4.0  4.0  4.0  8  4.260000 3.100000  5.39  5.250000
## 1st Qu.  6.5  6.5  6.5  8  6.315000 6.695000  6.25  6.170000
## Median   9.0  9.0  9.0  8  7.580000 8.140000  7.11  7.040000
## Mean     9.0  9.0  9.0  9  7.500909 7.500909  7.50  7.500909
## 3rd Qu. 11.5 11.5 11.5  8  8.570000 8.950000  7.98  8.190000
## Max.    14.0 14.0 14.0 19 10.840000 9.260000 12.74 12.500000

Remember that in R, the first indicator in square brackets always indicates the row, and the second indicator always indicates the column, such that anscombe[2, 3] would give us the value for the intersection of the second row and the third column. The same rationale translates to the margins we would like apply() to iterate over. The argument MARGIN = 2 specifies the columns, while MARGIN = 1 would indicate that a function should be applied over the rows:

apply(anscombe, 1, summary)

##            [,1]   [,2]     [,3]    [,4]     [,5]    [,6]   [,7]     [,8]
## Min.     6.5800 5.7600  7.58000 7.11000  7.81000  7.0400 5.2500  3.10000
## 1st Qu.  7.8650 6.9050  7.92750 8.57750  8.24750  8.0750 6.0000  4.00000
## Median   8.5900 8.0000 10.74000 8.82500  8.86500  9.4000 6.0400  4.13000
## Mean     8.6525 7.4525 10.47125 8.56625  9.35875 10.4925 6.3375  7.03125
## 3rd Qu. 10.0000 8.0000 13.00000 9.00000 11.00000 14.0000 6.4075  7.16750
## Max.    10.0000 8.1400 13.00000 9.00000 11.00000 14.0000 8.0000 19.00000
##            [,9]   [,10] [,11]
## Min.     5.5600 4.82000 4.740
## 1st Qu.  8.1125 6.85500 5.000
## Median   9.9850 7.00000 5.340
## Mean     9.7100 6.92625 5.755
## 3rd Qu. 12.0000 7.42250 6.020
## Max.    12.0000 8.00000 8.000

We now see a returned matrix of 11 columns, that give us the summary() over the 11 rows in the anscombe data set.

dim(anscombe)

## [1] 11  8

Display summary statistics of each column of the anscombe dataset using sapply().

sapply(anscombe, summary)

##           x1   x2   x3 x4        y1       y2    y3        y4
## Min.     4.0  4.0  4.0  8  4.260000 3.100000  5.39  5.250000
## 1st Qu.  6.5  6.5  6.5  8  6.315000 6.695000  6.25  6.170000
## Median   9.0  9.0  9.0  8  7.580000 8.140000  7.11  7.040000
## Mean     9.0  9.0  9.0  9  7.500909 7.500909  7.50  7.500909
## 3rd Qu. 11.5 11.5 11.5  8  8.570000 8.950000  7.98  8.190000
## Max.    14.0 14.0 14.0 19 10.840000 9.260000 12.74 12.500000

We can see that sapply() returns a matrix. We don’t have to specify any margins as the anscombe data set is of class data.frame:

class(anscombe)

## [1] "data.frame"

Objects of class data.frame can be addressed as a list, where the columns are the listed elements (see Lecture B). The function summary() will automatically be applied over the listed elements.

Display summary statistics of each column of the anscombe dataset using lapply().

lapply(anscombe, summary)

## $x1
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     4.0     6.5     9.0     9.0    11.5    14.0 
## 
## $x2
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     4.0     6.5     9.0     9.0    11.5    14.0 
## 
## $x3
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     4.0     6.5     9.0     9.0    11.5    14.0 
## 
## $x4
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       8       8       8       9       8      19 
## 
## $y1
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   4.260   6.315   7.580   7.501   8.570  10.840 
## 
## $y2
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   3.100   6.695   8.140   7.501   8.950   9.260 
## 
## $y3
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    5.39    6.25    7.11    7.50    7.98   12.74 
## 
## $y4
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   5.250   6.170   7.040   7.501   8.190  12.500

Function lapply() behaves just like sapply() - in fact, sapply() is a more user-friendly version of lapply(), but returns a list rather than a matrix. I, personally, prefer the sapply() - or the equivalent apply() over the columns - solution for anscombe data set. However, if a data set has many dimensions, the return from lapply() may be much more flexible to work with.

Write a function that takes a vector of numbers as input, and returns a string containing “The mean is XXX”, where XXX should be, of course, the mean of the input vector.

giveMeanAsString <- function(x) {
  paste("The mean is", mean(x))
}

Apply this to each column of anscombe.

sapply(anscombe, giveMeanAsString)

##                             x1                             x2 
##                "The mean is 9"                "The mean is 9" 
##                             x3                             x4 
##                "The mean is 9"                "The mean is 9" 
##                             y1                             y2 
## "The mean is 7.50090909090909" "The mean is 7.50090909090909" 
##                             y3                             y4 
##              "The mean is 7.5" "The mean is 7.50090909090909"

Now modify your function to round() off the means to have a single decimal, and apply it again to see the results.

giveRoundedMeanAsString <- function(x) {
  paste("The mean is", round(mean(x), 1))
}
sapply(anscombe, giveRoundedMeanAsString)

##                x1                x2                x3                x4 
##   "The mean is 9"   "The mean is 9"   "The mean is 9"   "The mean is 9" 
##                y1                y2                y3                y4 
## "The mean is 7.5" "The mean is 7.5" "The mean is 7.5" "The mean is 7.5"

The mammalsleep data set from the mice package shows data collected by Allison and Cicchetti (1976). It holds information for 62 mammal species on the interrelationship between sleep, ecological, and constitutional variables. The dataset contains missing values on five variables, which poses challenges when analyses include these variables.

We will use this datasets also more frequently this week, but we use it only once today. Therefore we could more efficiently call mice::mammalsleep to obtain only the mammalsleep data set without loading the whole mice package.

Write a function that takes a vector as input, and that returns a string that contains either (1) the mean and standard deviation (sd()) of the vector, if the vector is numeric, or (2) the levels of the vector, if it is categorical.

(a) Apply this function over each column of the mammalsleep dataset from the mice package.

columnInfo <- function(x) {
  if (is.numeric(x)) {
    return(paste("The mean is", round(mean(x), 2), "and the sd is", round(sd(x), 2)))
  } else {
    return(paste(levels(x), collapse = ", "))
  }
}
sapply(mice::mammalsleep, columnInfo)

##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                species 
## "African elephant, African giant pouched rat, Arctic Fox, Arctic ground squirrel, Asian elephant, Baboon, Big brown bat, Brazilian tapir, Cat, Chimpanzee, Chinchilla, Cow, Desert hedgehog, Donkey, Eastern American mole, Echidna, European hedgehog, Galago, Genet, Giant armadillo, Giraffe, Goat, Golden hamster, Gorilla, Gray seal, Gray wolf, Ground squirrel, Guinea pig, Horse, Jaguar, Kangaroo, Lesser short-tailed shrew, Little brown bat, Man, Mole rat, Mountain beaver, Mouse, Musk shrew, N. American opossum, Nine-banded armadillo, Okapi, Owl monkey, Patas monkey, Phanlanger, Pig, Rabbit, Raccoon, Rat, Red fox, Rhesus monkey, Rock hyrax (Hetero. b), Rock hyrax (Procavia hab), Roe deer, Sheep, Slow loris, Star nosed mole, Tenrec, Tree hyrax, Tree shrew, Vervet, Water opossum, Yellow-bellied marmot" 
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     bw 
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              "The mean is 198.79 and the sd is 899.16" 
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    brw 
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              "The mean is 283.13 and the sd is 930.28" 
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    sws 
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      "The mean is NA and the sd is NA" 
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     ps 
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      "The mean is NA and the sd is NA" 
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     ts 
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      "The mean is NA and the sd is NA" 
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    mls 
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      "The mean is NA and the sd is NA" 
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     gt 
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      "The mean is NA and the sd is NA" 
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     pi 
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  "The mean is 2.87 and the sd is 1.48" 
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    sei 
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   "The mean is 2.42 and the sd is 1.6" 
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    odi 
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  "The mean is 2.61 and the sd is 1.44"

(b) Does this function work for all of the columns? If not, fix it.

# We need to use the option na.rm=TRUE for mean and sd to make sure the missings are skipped. 
columnInfo <- function(x) {
  if (is.numeric(x)) {
    return(paste("The mean is", round(mean(x, na.rm = TRUE), 2), 
                 "and sd is",   round(sd(x, na.rm = TRUE), 2)))
  } else {
    return(paste(levels(x), collapse = ", "))
  }
}
sapply(mice::mammalsleep, columnInfo)

##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                species 
## "African elephant, African giant pouched rat, Arctic Fox, Arctic ground squirrel, Asian elephant, Baboon, Big brown bat, Brazilian tapir, Cat, Chimpanzee, Chinchilla, Cow, Desert hedgehog, Donkey, Eastern American mole, Echidna, European hedgehog, Galago, Genet, Giant armadillo, Giraffe, Goat, Golden hamster, Gorilla, Gray seal, Gray wolf, Ground squirrel, Guinea pig, Horse, Jaguar, Kangaroo, Lesser short-tailed shrew, Little brown bat, Man, Mole rat, Mountain beaver, Mouse, Musk shrew, N. American opossum, Nine-banded armadillo, Okapi, Owl monkey, Patas monkey, Phanlanger, Pig, Rabbit, Raccoon, Rat, Red fox, Rhesus monkey, Rock hyrax (Hetero. b), Rock hyrax (Procavia hab), Roe deer, Sheep, Slow loris, Star nosed mole, Tenrec, Tree hyrax, Tree shrew, Vervet, Water opossum, Yellow-bellied marmot" 
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     bw 
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  "The mean is 198.79 and sd is 899.16" 
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    brw 
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  "The mean is 283.13 and sd is 930.28" 
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    sws 
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      "The mean is 8.67 and sd is 3.67" 
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     ps 
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      "The mean is 1.97 and sd is 1.44" 
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     ts 
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     "The mean is 10.53 and sd is 4.61" 
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    mls 
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    "The mean is 19.88 and sd is 18.21" 
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     gt 
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  "The mean is 142.35 and sd is 146.81" 
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     pi 
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      "The mean is 2.87 and sd is 1.48" 
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    sei 
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       "The mean is 2.42 and sd is 1.6" 
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    odi 
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      "The mean is 2.61 and sd is 1.44"

End of Practical

Practical E: Solutions

Gerko Vink and Kees Mulder

Statistical Programming with R

Exercises

References