for
-loop that loops over all numbers
between 0 and 10, but only prints numbers below 5. for (i in 0:10) {
if (i < 5) {
print(i)
}
}
## [1] 0
## [1] 1
## [1] 2
## [1] 3
## [1] 4
for (i in 0:10) {
if (i >= 3 & i <= 5) {
print(i)
}
}
## [1] 3
## [1] 4
## [1] 5
Or, even more efficiently:
for (i in 0:10) {
if (i %in% 3:5) {
print(i)
}
}
## [1] 3
## [1] 4
## [1] 5
num <- 0:10
num[num >= 3 & num <=5]
## [1] 3 4 5
or, alternatively,
subset(num, num >= 3 & num <=5)
## [1] 3 4 5
byrow = TRUE
to fill a matrix left-to-right instead of
top-to-bottom.## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
## [1,] 1 2 3 4 5 6 7 8
## [2,] 2 4 6 8 10 12 14 16
## [3,] 3 6 9 12 15 18 21 24
## [4,] 4 8 12 16 20 24 28 32
## [5,] 5 10 15 20 25 30 35 40
# Create a matrix with 1 to 8.
mat <- matrix(1:8, ncol=8, nrow=5, byrow = TRUE)
# Loop over each row, and multiply it.
for (i in 1:5) {
mat[i, ] <- mat[i, ] * i
}
mat
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
## [1,] 1 2 3 4 5 6 7 8
## [2,] 2 4 6 8 10 12 14 16
## [3,] 3 6 9 12 15 18 21 24
## [4,] 4 8 12 16 20 24 28 32
## [5,] 5 10 15 20 25 30 35 40
string.mat <- matrix(NA, ncol = 6, nrow = 6)
for (i in 1:6) {
for (j in 1:6) {
string.mat[i, j] <- paste(i, "+", j, "=", i+j, sep="")
}
}
string.mat
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] "1+1=2" "1+2=3" "1+3=4" "1+4=5" "1+5=6" "1+6=7"
## [2,] "2+1=3" "2+2=4" "2+3=5" "2+4=6" "2+5=7" "2+6=8"
## [3,] "3+1=4" "3+2=5" "3+3=6" "3+4=7" "3+5=8" "3+6=9"
## [4,] "4+1=5" "4+2=6" "4+3=7" "4+4=8" "4+5=9" "4+6=10"
## [5,] "5+1=6" "5+2=7" "5+3=8" "5+4=9" "5+5=10" "5+6=11"
## [6,] "6+1=7" "6+2=8" "6+3=9" "6+4=10" "6+5=11" "6+6=12"
"Sum > 8"
in the
matrix in the cells where that is true.string.mat <- matrix(NA, ncol = 6, nrow = 6)
for (i in 1:6) {
for (j in 1:6) {
if (i+j <= 8) {
string.mat[i, j] <- paste(i, "+", j, "=", i+j, sep="")
} else {
string.mat[i, j] <- "Sum > 8"
}
}
}
string.mat
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] "1+1=2" "1+2=3" "1+3=4" "1+4=5" "1+5=6" "1+6=7"
## [2,] "2+1=3" "2+2=4" "2+3=5" "2+4=6" "2+5=7" "2+6=8"
## [3,] "3+1=4" "3+2=5" "3+3=6" "3+4=7" "3+5=8" "Sum > 8"
## [4,] "4+1=5" "4+2=6" "4+3=7" "4+4=8" "Sum > 8" "Sum > 8"
## [5,] "5+1=6" "5+2=7" "5+3=8" "Sum > 8" "Sum > 8" "Sum > 8"
## [6,] "6+1=7" "6+2=8" "Sum > 8" "Sum > 8" "Sum > 8" "Sum > 8"
The anscombe
data set is a wonderful data set from 1973
by Francis J. Anscombe aimed to demonstrate that pairs of variables can
have the same statistical properties, while having completely differnt
graphical representations. We will be using this data set more this
week. If you’d like to know more about anscombe
, you can
simply call ?anscombe
to enter the help.
You can directly call anscombe
from your console because
the datasets
package is a base package in R
.
This means that it is always included and loaded when you start an
R
instance. In general, when you would like to access
functions or data sets from packages that are not automatically loaded,
we don’t have to explicitly load the package. We can also call
package::thing-we-need
to directly ‘grab’ the
thing-we-need
from the package
namespace. For
example,
test <- datasets::anscombe
identical(test, anscombe) #test if identical
## [1] TRUE
This is especially handy within functions, as we can call
package::function-name
to borrow functionality from
installed packages, without loading the whole package. Calling only
those functions that you need is more memory-efficient than loading it
all. More memory efficient means faster computation.
summary
) of each column of the anscombe
dataset from the datasets
package# Using i as an indicator for the current column.
for (i in 1:ncol(anscombe)) {
print(colnames(anscombe)[i])
print(summary(anscombe[, i]))
}
## [1] "x1"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4.0 6.5 9.0 9.0 11.5 14.0
## [1] "x2"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4.0 6.5 9.0 9.0 11.5 14.0
## [1] "x3"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4.0 6.5 9.0 9.0 11.5 14.0
## [1] "x4"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8 8 8 9 8 19
## [1] "y1"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4.260 6.315 7.580 7.501 8.570 10.840
## [1] "y2"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.100 6.695 8.140 7.501 8.950 9.260
## [1] "y3"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 5.39 6.25 7.11 7.50 7.98 12.74
## [1] "y4"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 5.250 6.170 7.040 7.501 8.190 12.500
# Looping over the variables directly.
# Although the code is a bit more clear, this does mean that we can not access the names of the variables.
# So the output is less clear.
for (i in anscombe) {
print(summary(i))
}
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4.0 6.5 9.0 9.0 11.5 14.0
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4.0 6.5 9.0 9.0 11.5 14.0
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4.0 6.5 9.0 9.0 11.5 14.0
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8 8 8 9 8 19
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4.260 6.315 7.580 7.501 8.570 10.840
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.100 6.695 8.140 7.501 8.950 9.260
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 5.39 6.25 7.11 7.50 7.98 12.74
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 5.250 6.170 7.040 7.501 8.190 12.500
anscombe
dataset using apply()
.apply(X = anscombe, MARGIN = 2, FUN = summary)
## x1 x2 x3 x4 y1 y2 y3 y4
## Min. 4.0 4.0 4.0 8 4.260000 3.100000 5.39 5.250000
## 1st Qu. 6.5 6.5 6.5 8 6.315000 6.695000 6.25 6.170000
## Median 9.0 9.0 9.0 8 7.580000 8.140000 7.11 7.040000
## Mean 9.0 9.0 9.0 9 7.500909 7.500909 7.50 7.500909
## 3rd Qu. 11.5 11.5 11.5 8 8.570000 8.950000 7.98 8.190000
## Max. 14.0 14.0 14.0 19 10.840000 9.260000 12.74 12.500000
Remember that in R
, the first indicator in square
brackets always indicates the row, and the second indicator always
indicates the column, such that anscombe[2, 3]
would give
us the value for the intersection of the second row and the third
column. The same rationale translates to the margins we would like
apply()
to iterate over. The argument
MARGIN = 2
specifies the columns, while
MARGIN = 1
would indicate that a function should be applied
over the rows:
apply(anscombe, 1, summary)
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
## Min. 6.5800 5.7600 7.58000 7.11000 7.81000 7.0400 5.2500 3.10000
## 1st Qu. 7.8650 6.9050 7.92750 8.57750 8.24750 8.0750 6.0000 4.00000
## Median 8.5900 8.0000 10.74000 8.82500 8.86500 9.4000 6.0400 4.13000
## Mean 8.6525 7.4525 10.47125 8.56625 9.35875 10.4925 6.3375 7.03125
## 3rd Qu. 10.0000 8.0000 13.00000 9.00000 11.00000 14.0000 6.4075 7.16750
## Max. 10.0000 8.1400 13.00000 9.00000 11.00000 14.0000 8.0000 19.00000
## [,9] [,10] [,11]
## Min. 5.5600 4.82000 4.740
## 1st Qu. 8.1125 6.85500 5.000
## Median 9.9850 7.00000 5.340
## Mean 9.7100 6.92625 5.755
## 3rd Qu. 12.0000 7.42250 6.020
## Max. 12.0000 8.00000 8.000
We now see a returned matrix of 11
columns, that give us
the summary()
over the 11 rows in the anscombe
data set.
dim(anscombe)
## [1] 11 8
anscombe
dataset using sapply()
.
sapply(anscombe, summary)
## x1 x2 x3 x4 y1 y2 y3 y4
## Min. 4.0 4.0 4.0 8 4.260000 3.100000 5.39 5.250000
## 1st Qu. 6.5 6.5 6.5 8 6.315000 6.695000 6.25 6.170000
## Median 9.0 9.0 9.0 8 7.580000 8.140000 7.11 7.040000
## Mean 9.0 9.0 9.0 9 7.500909 7.500909 7.50 7.500909
## 3rd Qu. 11.5 11.5 11.5 8 8.570000 8.950000 7.98 8.190000
## Max. 14.0 14.0 14.0 19 10.840000 9.260000 12.74 12.500000
We can see that sapply()
returns a matrix. We don’t have
to specify any margins as the anscombe
data set is of class
data.frame
:
class(anscombe)
## [1] "data.frame"
Objects of class data.frame
can be addressed as a list,
where the columns are the listed elements (see Lecture B). The function
summary()
will automatically be applied over the listed
elements.
anscombe
dataset using lapply()
.
lapply(anscombe, summary)
## $x1
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4.0 6.5 9.0 9.0 11.5 14.0
##
## $x2
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4.0 6.5 9.0 9.0 11.5 14.0
##
## $x3
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4.0 6.5 9.0 9.0 11.5 14.0
##
## $x4
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8 8 8 9 8 19
##
## $y1
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4.260 6.315 7.580 7.501 8.570 10.840
##
## $y2
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.100 6.695 8.140 7.501 8.950 9.260
##
## $y3
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 5.39 6.25 7.11 7.50 7.98 12.74
##
## $y4
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 5.250 6.170 7.040 7.501 8.190 12.500
Function lapply()
behaves just like
sapply()
- in fact, sapply()
is a more
user-friendly version of lapply()
, but returns a list
rather than a matrix. I, personally, prefer the sapply()
-
or the equivalent apply()
over the columns - solution for
anscombe
data set. However, if a data set has many
dimensions, the return from lapply()
may be much more
flexible to work with.
giveMeanAsString <- function(x) {
paste("The mean is", mean(x))
}
anscombe
.
sapply(anscombe, giveMeanAsString)
## x1 x2
## "The mean is 9" "The mean is 9"
## x3 x4
## "The mean is 9" "The mean is 9"
## y1 y2
## "The mean is 7.50090909090909" "The mean is 7.50090909090909"
## y3 y4
## "The mean is 7.5" "The mean is 7.50090909090909"
round()
off the
means to have a single decimal, and apply
it again to see
the results.giveRoundedMeanAsString <- function(x) {
paste("The mean is", round(mean(x), 1))
}
sapply(anscombe, giveRoundedMeanAsString)
## x1 x2 x3 x4
## "The mean is 9" "The mean is 9" "The mean is 9" "The mean is 9"
## y1 y2 y3 y4
## "The mean is 7.5" "The mean is 7.5" "The mean is 7.5" "The mean is 7.5"
The mammalsleep
data set from the mice
package shows data collected by Allison and Cicchetti (1976). It holds
information for 62 mammal species on the interrelationship between
sleep, ecological, and constitutional variables. The dataset contains
missing values on five variables, which poses challenges when analyses
include these variables.
We will use this datasets also more frequently this week, but we use
it only once today. Therefore we could more efficiently call
mice::mammalsleep
to obtain only the
mammalsleep
data set without loading the whole
mice
package.
sd()
) of the vector, if the vector is numeric,
or (2) the levels
of the vector, if it is
categorical.mammalsleep
dataset from the mice
package.columnInfo <- function(x) {
if (is.numeric(x)) {
return(paste("The mean is", round(mean(x), 2), "and the sd is", round(sd(x), 2)))
} else {
return(paste(levels(x), collapse = ", "))
}
}
sapply(mice::mammalsleep, columnInfo)
## species
## "African elephant, African giant pouched rat, Arctic Fox, Arctic ground squirrel, Asian elephant, Baboon, Big brown bat, Brazilian tapir, Cat, Chimpanzee, Chinchilla, Cow, Desert hedgehog, Donkey, Eastern American mole, Echidna, European hedgehog, Galago, Genet, Giant armadillo, Giraffe, Goat, Golden hamster, Gorilla, Gray seal, Gray wolf, Ground squirrel, Guinea pig, Horse, Jaguar, Kangaroo, Lesser short-tailed shrew, Little brown bat, Man, Mole rat, Mountain beaver, Mouse, Musk shrew, N. American opossum, Nine-banded armadillo, Okapi, Owl monkey, Patas monkey, Phanlanger, Pig, Rabbit, Raccoon, Rat, Red fox, Rhesus monkey, Rock hyrax (Hetero. b), Rock hyrax (Procavia hab), Roe deer, Sheep, Slow loris, Star nosed mole, Tenrec, Tree hyrax, Tree shrew, Vervet, Water opossum, Yellow-bellied marmot"
## bw
## "The mean is 198.79 and the sd is 899.16"
## brw
## "The mean is 283.13 and the sd is 930.28"
## sws
## "The mean is NA and the sd is NA"
## ps
## "The mean is NA and the sd is NA"
## ts
## "The mean is NA and the sd is NA"
## mls
## "The mean is NA and the sd is NA"
## gt
## "The mean is NA and the sd is NA"
## pi
## "The mean is 2.87 and the sd is 1.48"
## sei
## "The mean is 2.42 and the sd is 1.6"
## odi
## "The mean is 2.61 and the sd is 1.44"
# We need to use the option na.rm=TRUE for mean and sd to make sure the missings are skipped.
columnInfo <- function(x) {
if (is.numeric(x)) {
return(paste("The mean is", round(mean(x, na.rm = TRUE), 2),
"and sd is", round(sd(x, na.rm = TRUE), 2)))
} else {
return(paste(levels(x), collapse = ", "))
}
}
sapply(mice::mammalsleep, columnInfo)
## species
## "African elephant, African giant pouched rat, Arctic Fox, Arctic ground squirrel, Asian elephant, Baboon, Big brown bat, Brazilian tapir, Cat, Chimpanzee, Chinchilla, Cow, Desert hedgehog, Donkey, Eastern American mole, Echidna, European hedgehog, Galago, Genet, Giant armadillo, Giraffe, Goat, Golden hamster, Gorilla, Gray seal, Gray wolf, Ground squirrel, Guinea pig, Horse, Jaguar, Kangaroo, Lesser short-tailed shrew, Little brown bat, Man, Mole rat, Mountain beaver, Mouse, Musk shrew, N. American opossum, Nine-banded armadillo, Okapi, Owl monkey, Patas monkey, Phanlanger, Pig, Rabbit, Raccoon, Rat, Red fox, Rhesus monkey, Rock hyrax (Hetero. b), Rock hyrax (Procavia hab), Roe deer, Sheep, Slow loris, Star nosed mole, Tenrec, Tree hyrax, Tree shrew, Vervet, Water opossum, Yellow-bellied marmot"
## bw
## "The mean is 198.79 and sd is 899.16"
## brw
## "The mean is 283.13 and sd is 930.28"
## sws
## "The mean is 8.67 and sd is 3.67"
## ps
## "The mean is 1.97 and sd is 1.44"
## ts
## "The mean is 10.53 and sd is 4.61"
## mls
## "The mean is 19.88 and sd is 18.21"
## gt
## "The mean is 142.35 and sd is 146.81"
## pi
## "The mean is 2.87 and sd is 1.48"
## sei
## "The mean is 2.42 and sd is 1.6"
## odi
## "The mean is 2.61 and sd is 1.44"
End of Practical
Allison, T., Cicchetti, D.V. (1976). Sleep in Mammals: Ecological and Constitutional Correlates. Science, 194(4266), 732-734.
Anscombe, Francis J. (1973) Graphs in statistical analysis. American Statistician, 27, 17–21.