library(MASS) #for the cats data
Statistical Programming with R
library(MASS) #for the cats data
If-statements
For-loops
Apply()
Writing your own functions
if(cond)
expr
if(cond)
cons.expr
else
alt.expr
for(var in seq)
expr
rev()
: reverse version of an argumentapply()
: apply a function to margins of a matrixsapply()
: apply a function to elements of a list, vector
or matrix
returnlapply()
: apply a function to elements of a list, list
returnprint()
: print an object to the consolecat
: outputs an object with less conversion than print()
Often, we want to run some code only if some condition is true.
For example:
a <- 2 if (a > 5){ print("a is larger than 5.") }
a <- 8 if (a > 5){ print("a is larger than 5.") }
## [1] "a is larger than 5."
We can also specify something to be run if the condition is not true.
a <- 2 if (a > 5){ print("a is larger than 5.") } else { print("a is smaller than 5.") }
## [1] "a is smaller than 5."
a <- 8 if (a > 5){ print("a is larger than 5.") } else { print("a is smaller than 5.") }
## [1] "a is larger than 5."
For loops are used when we want to perform some repetitive calculations.
It is often tedious, or even impossible, to write this repetition out completely.
# Let's print the numbers 1 to 6 one by one. print(1)
## [1] 1
print(2)
## [1] 2
print(3)
## [1] 3
print(4)
## [1] 4
print(5)
## [1] 5
print(6)
## [1] 6
For-loops allow us to automate this!
for (i in 1:6){ print(i) }
## [1] 1 ## [1] 2 ## [1] 3 ## [1] 4 ## [1] 5 ## [1] 6
for (some.var.name in 1:6){ print(some.var.name) }
## [1] 1 ## [1] 2 ## [1] 3 ## [1] 4 ## [1] 5 ## [1] 6
for (i in 1:6){ print(i < 5) }
## [1] TRUE ## [1] TRUE ## [1] TRUE ## [1] TRUE ## [1] FALSE ## [1] FALSE
for (i in 1:nrow(cats)){ if (cats$Bwt[i] > 2.5){ cat(i, "is over 2.5. It is:", cats$Bwt[i], "\n") } }
## 37 is over 2.5. It is: 2.6 ## 38 is over 2.5. It is: 2.6 ## 39 is over 2.5. It is: 2.6 ## 40 is over 2.5. It is: 2.7 ## 41 is over 2.5. It is: 2.7 ## 42 is over 2.5. It is: 2.7 ## 43 is over 2.5. It is: 2.9 ## 44 is over 2.5. It is: 2.9 ## 45 is over 2.5. It is: 2.9 ## 46 is over 2.5. It is: 3 ## 47 is over 2.5. It is: 3 ## 73 is over 2.5. It is: 2.6 ## 74 is over 2.5. It is: 2.6 ## 75 is over 2.5. It is: 2.6 ## 76 is over 2.5. It is: 2.6 ## 77 is over 2.5. It is: 2.6 ## 78 is over 2.5. It is: 2.6 ## 79 is over 2.5. It is: 2.7 ## 80 is over 2.5. It is: 2.7 ## 81 is over 2.5. It is: 2.7 ## 82 is over 2.5. It is: 2.7 ## 83 is over 2.5. It is: 2.7 ## 84 is over 2.5. It is: 2.7 ## 85 is over 2.5. It is: 2.7 ## 86 is over 2.5. It is: 2.7 ## 87 is over 2.5. It is: 2.7 ## 88 is over 2.5. It is: 2.8 ## 89 is over 2.5. It is: 2.8 ## 90 is over 2.5. It is: 2.8 ## 91 is over 2.5. It is: 2.8 ## 92 is over 2.5. It is: 2.8 ## 93 is over 2.5. It is: 2.8 ## 94 is over 2.5. It is: 2.8 ## 95 is over 2.5. It is: 2.9 ## 96 is over 2.5. It is: 2.9 ## 97 is over 2.5. It is: 2.9 ## 98 is over 2.5. It is: 2.9 ## 99 is over 2.5. It is: 2.9 ## 100 is over 2.5. It is: 3 ## 101 is over 2.5. It is: 3 ## 102 is over 2.5. It is: 3 ## 103 is over 2.5. It is: 3 ## 104 is over 2.5. It is: 3 ## 105 is over 2.5. It is: 3 ## 106 is over 2.5. It is: 3 ## 107 is over 2.5. It is: 3 ## 108 is over 2.5. It is: 3 ## 109 is over 2.5. It is: 3.1 ## 110 is over 2.5. It is: 3.1 ## 111 is over 2.5. It is: 3.1 ## 112 is over 2.5. It is: 3.1 ## 113 is over 2.5. It is: 3.1 ## 114 is over 2.5. It is: 3.1 ## 115 is over 2.5. It is: 3.2 ## 116 is over 2.5. It is: 3.2 ## 117 is over 2.5. It is: 3.2 ## 118 is over 2.5. It is: 3.2 ## 119 is over 2.5. It is: 3.2 ## 120 is over 2.5. It is: 3.2 ## 121 is over 2.5. It is: 3.3 ## 122 is over 2.5. It is: 3.3 ## 123 is over 2.5. It is: 3.3 ## 124 is over 2.5. It is: 3.3 ## 125 is over 2.5. It is: 3.3 ## 126 is over 2.5. It is: 3.4 ## 127 is over 2.5. It is: 3.4 ## 128 is over 2.5. It is: 3.4 ## 129 is over 2.5. It is: 3.4 ## 130 is over 2.5. It is: 3.4 ## 131 is over 2.5. It is: 3.5 ## 132 is over 2.5. It is: 3.5 ## 133 is over 2.5. It is: 3.5 ## 134 is over 2.5. It is: 3.5 ## 135 is over 2.5. It is: 3.5 ## 136 is over 2.5. It is: 3.6 ## 137 is over 2.5. It is: 3.6 ## 138 is over 2.5. It is: 3.6 ## 139 is over 2.5. It is: 3.6 ## 140 is over 2.5. It is: 3.7 ## 141 is over 2.5. It is: 3.8 ## 142 is over 2.5. It is: 3.8 ## 143 is over 2.5. It is: 3.9 ## 144 is over 2.5. It is: 3.9
for (i in 1:ncol(cats)){ print(class(cats[, i])) }
## [1] "factor" ## [1] "numeric" ## [1] "numeric"
empty.cats <- matrix(NA, nrow = nrow(cats), ncol = ncol(cats)) for (i in 1:ncol(cats)){ empty.cats[, i] <- rep(i, nrow(cats)) } head(empty.cats)
## [,1] [,2] [,3] ## [1,] 1 2 3 ## [2,] 1 2 3 ## [3,] 1 2 3 ## [4,] 1 2 3 ## [5,] 1 2 3 ## [6,] 1 2 3
for (i in 1:3){ for (j in 1:3){ print(paste(i, "x", j, "=", i*j)) } }
## [1] "1 x 1 = 1" ## [1] "1 x 2 = 2" ## [1] "1 x 3 = 3" ## [1] "2 x 1 = 2" ## [1] "2 x 2 = 4" ## [1] "2 x 3 = 6" ## [1] "3 x 1 = 3" ## [1] "3 x 2 = 6" ## [1] "3 x 3 = 9"
my.list <- list(A = c(4, 2, 1:3), B = "Hello.", C = TRUE) for (list.item in my.list){ cat("One element is", list.item, "\n") }
## One element is 4 2 1 2 3 ## One element is Hello. ## One element is TRUE
But if we want to change the item, we have to be able to access it, so we are better off using 1:length(list)
, or 1:ncol(data.frame)
/1:nrow(data.frame)
.
my.list <- list(A = c(4, 2, 1:3), B = "Hello.", C = c(TRUE, TRUE, FALSE)) for (iter in 1:length(my.list)){ my.list[[iter]] <- rev(my.list[[iter]]) } my.list
## $A ## [1] 3 2 1 2 4 ## ## $B ## [1] "Hello." ## ## $C ## [1] FALSE TRUE TRUE
apply()
familyapply()
The apply
family is a group of very useful functions that allow you to easily execute a function of your choice over a list of objects, such as a list
, a data.frame
, or matrix
.
We will look at three examples:
apply
sapply
lapply
apply()
apply
is used for matrices (and sometimes dataframes). It can take a function that takes a vector as input, and apply it to each row or column.
apply()
MARGIN is 1 for rows, 2 for columns.
apply(cats[, -1], MARGIN = 2, mean)
## Bwt Hwt ## 2.723611 10.630556
But we’ve seen this done easier:
colMeans(cats[, -1])
## Bwt Hwt ## 2.723611 10.630556
apply()
However, the power of apply()
is that it can use any function we throw at it.
apply()
rand.mat <- matrix(rnorm(21), nrow = 3, ncol = 7) rand.mat
## [,1] [,2] [,3] [,4] [,5] [,6] ## [1,] -0.5495413 1.6040110 0.2065810 -0.4313313 0.8031660 0.04598861 ## [2,] -0.4558570 -0.0929378 0.7613544 1.1739431 -0.5310347 0.45647626 ## [3,] -0.9278759 0.5931129 1.7889414 1.0087794 0.3678748 -0.96723643 ## [,7] ## [1,] 0.38257922 ## [2,] 0.02234608 ## [3,] -0.75743886
apply(rand.mat, MARGIN = 1, FUN = max)
## [1] 1.604011 1.173943 1.788941
apply(rand.mat, MARGIN = 2, FUN = max)
## [1] -0.4558570 1.6040110 1.7889414 1.1739431 0.8031660 0.4564763 0.3825792
apply()
rand.mat
## [,1] [,2] [,3] [,4] [,5] [,6] ## [1,] -0.5495413 1.6040110 0.2065810 -0.4313313 0.8031660 0.04598861 ## [2,] -0.4558570 -0.0929378 0.7613544 1.1739431 -0.5310347 0.45647626 ## [3,] -0.9278759 0.5931129 1.7889414 1.0087794 0.3678748 -0.96723643 ## [,7] ## [1,] 0.38257922 ## [2,] 0.02234608 ## [3,] -0.75743886
apply(rand.mat, MARGIN = 1, FUN = sum)
## [1] 2.061453 1.334290 1.106157
apply(rand.mat, MARGIN = 2, FUN = var)
## [1] 0.0624526 0.7287026 0.6445955 0.7796840 0.4629347 0.5370138 0.3395789
sapply()
sapply()
is used on list
-objects and it returns a vector or a matrix, depending on the output of the function.
my.list <- list(A = c(4, 2, 1:3), B = "Hello.", C = TRUE) sapply(my.list, class)
## A B C ## "numeric" "character" "logical"
sapply()
my.list <- list(A = c(4, 2, 1:3), B = "Hello.", C = TRUE) sapply(my.list, range)
## A B C ## [1,] "1" "Hello." "1" ## [2,] "4" "Hello." "1"
Why is each element a character string?
sapply()
Any data.frame
is also a list
, where each column is one list
-element.
class(cats)
## [1] "data.frame"
is.list(cats)
## [1] TRUE
sapply()
print.default(cats[1:10, ])
## $Sex ## [1] F F F F F F F F F F ## Levels: F M ## ## $Bwt ## [1] 2.0 2.0 2.0 2.1 2.1 2.1 2.1 2.1 2.1 2.1 ## ## $Hwt ## [1] 7.0 7.4 9.5 7.2 7.3 7.6 8.1 8.2 8.3 8.5 ## ## attr(,"class") ## [1] "data.frame"
sapply()
This means we can use sapply
on data frames as well, which is often useful.
sapply(cats, class)
## Sex Bwt Hwt ## "factor" "numeric" "numeric"
lapply()
lapply()
is exactly the same as sapply()
, but it returns a list of the same length as input list object, each element of which is the result of applying FUN to the corresponding element of list.
lapply(cats, class)
## $Sex ## [1] "factor" ## ## $Bwt ## [1] "numeric" ## ## $Hwt ## [1] "numeric"
Functions are reusable pieces of code that take an input, do some computation on the input, and return output.
We have been using a lot of functions: code of the form something()
is usually a function.
mean(1:6)
## [1] 3.5
We can make our own functions as follows:
squared <- function (x){ x.square <- x * x return(x.square) } squared(4)
## [1] 16
x
, the input, is called the (formal) argument of the function. x.square
is called the return value.
If there is no return()
, the last line is automatically returned, so we can also just write:
squared <- function(x){ x * x } squared(-2)
## [1] 4
We can also combine this with apply()
rand.mat
## [,1] [,2] [,3] [,4] [,5] [,6] ## [1,] -0.5495413 1.6040110 0.2065810 -0.4313313 0.8031660 0.04598861 ## [2,] -0.4558570 -0.0929378 0.7613544 1.1739431 -0.5310347 0.45647626 ## [3,] -0.9278759 0.5931129 1.7889414 1.0087794 0.3678748 -0.96723643 ## [,7] ## [1,] 0.38257922 ## [2,] 0.02234608 ## [3,] -0.75743886
apply(rand.mat, 2, squared)
## [,1] [,2] [,3] [,4] [,5] [,6] ## [1,] 0.3019956 2.572851154 0.0426757 0.1860467 0.6450756 0.002114953 ## [2,] 0.2078056 0.008637434 0.5796606 1.3781424 0.2819978 0.208370575 ## [3,] 0.8609536 0.351782879 3.2003112 1.0176358 0.1353319 0.935546313 ## [,7] ## [1,] 0.1463668596 ## [2,] 0.0004993472 ## [3,] 0.5737136215
areDivisible <- function(x, y){ if (x > y){ larger <- x smaller <- y } else { larger <- y smaller <- x } remainder <- larger %% smaller remainder == 0 }
areDivisible(3, 10)
## [1] FALSE
areDivisible(3, 9)
## [1] TRUE
areDivisible(9, 3)
## [1] TRUE
Default options for some arguments are provided in many functions.
They allow us to provide an additional option, but if no choice is provided, we can choose for the user of the function.
areDivisible <- function(x, y, printInput = TRUE){ if (printInput){ cat("Testing if", x, "and", y, "are divisible: \n") } ((x %% y) == 0) | ((y %% x) == 0) }
areDivisible(3, 10)
## Testing if 3 and 10 are divisible:
## [1] FALSE
areDivisible(3, 9, printInput = TRUE)
## Testing if 3 and 9 are divisible:
## [1] TRUE
areDivisible(9, 3, printInput = FALSE)
## [1] TRUE
Your first self-written for-loop, or function, will probably not work.
Don’t panic! Just go line-by-line, keeping track of what is currently inside each variable.
rand.vec <- rnorm(10) # Add ten to each element. for (i in 1:rand.vec){ i <- i + 10 }
## Warning in 1:rand.vec: numerical expression has 10 elements: only the first used
rand.vec
## [1] -0.86272069 -0.11600941 2.01116949 0.09772563 -0.69188855 -1.02612582 ## [7] 0.60817253 -1.09196830 2.19492100 1.37205518
# Add ten to each element. for (i in 1:rand.vec){ i <- i + 10 }
1:rand.vec
## Warning in 1:rand.vec: numerical expression has 10 elements: only the first used
## [1] 1 0
1:length(rand.vec)
## [1] 1 2 3 4 5 6 7 8 9 10
Let’s try again:
# Add ten to each element. for (i in 1:length(rand.vec)){ i <- i + 10 }
rand.vec
## [1] -0.86272069 -0.11600941 2.01116949 0.09772563 -0.69188855 -1.02612582 ## [7] 0.60817253 -1.09196830 2.19492100 1.37205518
Still, no luck.
# Add ten to each element. for (i in 1:length(rand.vec)){ i <- i + 10 }
i <- 1 i
## [1] 1
# Add ten to each element. for (i in 1:length(rand.vec)){ i <- i + 10 }
i <- i + 10 i
## [1] 11
i <- 2 i
## [1] 2
# Add ten to each element. for (i in 1:length(rand.vec)){ rand.vec[i] <- rand.vec[i] + 10 } rand.vec
## [1] 9.137279 9.883991 12.011169 10.097726 9.308111 8.973874 10.608173 ## [8] 8.908032 12.194921 11.372055