Begin this practical exercise by setting the maximum line length in
R-Studio
to 80 characters. Go to RStudio
’s
Preferences
(or Global Options
under
Tools
) –> Code
–> Display
,
and tick the show margin
box. Make sure that the
margin column
is set to 80
mice
. Go to Tools
> Install Packages
in
RStudio
. If you are connected to the internet, select
Repository
under Install From
and type
mice
under Packages
. Leave the
Install to Library
at default and make sure that
Install Dependencies
is selected. Click install. If you are
not connected to the internet, select Package Archive File
under “Install from” and navigate to the respective file on your
drive.
Some packages depend on other packages, meaning that their functionality may be limited if their dependencies are not installed. Installing dependencies is therefor recommended, but internet connectivity is required.
If all is right, you will receive a message in the console that the package has been installed (as well as its dependencies).
ALternatively, if you know the name of the package you would like to
install - in this case mice
- you can also call
install.packages("mice")
in the console window.
mice
. Loading packages
can be done through functions library()
and
require()
.library(mice)
##
## Attaching package: 'mice'
## The following object is masked from 'package:stats':
##
## filter
## The following objects are masked from 'package:base':
##
## cbind, rbind
If you use require()
within a function, and the required
package is not available, require()
will yield a warning
and the remainder of the function is still executed, whereas
library()
will yield an error and terminate all executions.
The use of library()
when not doing too complicated things
is preferred - require()
would result in more computational
overhead because it calls library()
itself.
mammalsleep
dataset from package mice
by
typing mammalsleep
in the console, and subsequently by
using the function View()
. Using View()
is preferred for inspecting datasets that
are large. View()
opens the dataset in a spreadsheet-like
window (conform MS Excel, or SPSS). If you View()
your own
datasets, you can even edit the datasets’ contents.
mice
to the work directory (= the directory of your R project) as a
tab-delimited text file with .
as a decimal separator. Use
the function write.table()
and name the file
mammalsleep.txt
.library(mice)
write.table(mammalsleep, "mammalsleep.txt", sep = "\t", dec = ".", row.names = FALSE)
The command sep = "\t"
indicates that the file is
tabulated and the command dec = "."
indicates that a point
is used as the decimal separator (instead of a comma).
row.names = FALSE
tells R
that row names are
not to be included in the exported file.
mammalsleep.txt
file with
read.table()
. sleepdata <- read.table("mammalsleep.txt", sep = "\t", dec = ".", header = TRUE, stringsAsFactors = TRUE)
The command sep = "\t"
indicates that the file is
tabulated and the command dec = "."
indicates that a point
is used as the decimal separator (instead of a comma).
header = TRUE
tells R
that variable names are
included in the header.
All files that are presented in the work directory of the current
R
project, can essentially be imported into the workspace
(the space that contains all environments) directly. All other locations
require you to specify the specific path from the root of your machine.
To find out what the current work directory is, you can type
getwd()
and to change the work directory you can use
setwd()
. The beauty of using projects in RStudio is that
you would never have to change the work directory, as the work directory
is automatically set, relative to your projects’
R
-scripts.
There are many packages that facilitate importing data sets from
other statistical software packages, such as SPSS (e.g. function
read_spss
from package haven
), Mplus (package
MplusAutomation
), Stata (read.dta()
in
foreign
), SAS (sasxport.get()
from package
Hmisc
) and from spreadsheet software, such as MS Excel
(function read.xlsx()
from package xlsx
). For
a short guideline to import multiple formats into R
, see
e.g. http://www.statmethods.net/input/importingdata.html.
If you would like to know more about this dataset, you can open the
help for the mammalsleep
dataset in package
mice
through ?mammalsleep
.
Inspecting the sleepdata could be done by
#the data structure
summary(sleepdata) #distributional summaries
## species bw brw
## African elephant : 1 Min. : 0.005 Min. : 0.14
## African giant pouched rat: 1 1st Qu.: 0.600 1st Qu.: 4.25
## Arctic Fox : 1 Median : 3.342 Median : 17.25
## Arctic ground squirrel : 1 Mean : 198.790 Mean : 283.13
## Asian elephant : 1 3rd Qu.: 48.202 3rd Qu.: 166.00
## Baboon : 1 Max. :6654.000 Max. :5712.00
## (Other) :56
## sws ps ts mls
## Min. : 2.100 Min. :0.000 Min. : 2.60 Min. : 2.000
## 1st Qu.: 6.250 1st Qu.:0.900 1st Qu.: 8.05 1st Qu.: 6.625
## Median : 8.350 Median :1.800 Median :10.45 Median : 15.100
## Mean : 8.673 Mean :1.972 Mean :10.53 Mean : 19.878
## 3rd Qu.:11.000 3rd Qu.:2.550 3rd Qu.:13.20 3rd Qu.: 27.750
## Max. :17.900 Max. :6.600 Max. :19.90 Max. :100.000
## NA's :14 NA's :12 NA's :4 NA's :4
## gt pi sei odi
## Min. : 12.00 Min. :1.000 Min. :1.000 Min. :1.000
## 1st Qu.: 35.75 1st Qu.:2.000 1st Qu.:1.000 1st Qu.:1.000
## Median : 79.00 Median :3.000 Median :2.000 Median :2.000
## Mean :142.35 Mean :2.871 Mean :2.419 Mean :2.613
## 3rd Qu.:207.50 3rd Qu.:4.000 3rd Qu.:4.000 3rd Qu.:4.000
## Max. :645.00 Max. :5.000 Max. :5.000 Max. :5.000
## NA's :4
round(cor(sleepdata[, -1], use = "pairwise.complete.obs"), 2) #bivariate correlations, variable 1 excluded.
## bw brw sws ps ts mls gt pi sei odi
## bw 1.00 0.93 -0.38 -0.11 -0.31 0.30 0.65 0.06 0.34 0.13
## brw 0.93 1.00 -0.37 -0.11 -0.36 0.51 0.75 0.03 0.37 0.15
## sws -0.38 -0.37 1.00 0.51 0.96 -0.38 -0.59 -0.32 -0.54 -0.48
## ps -0.11 -0.11 0.51 1.00 0.73 -0.30 -0.45 -0.45 -0.54 -0.58
## ts -0.31 -0.36 0.96 0.73 1.00 -0.41 -0.63 -0.40 -0.64 -0.59
## mls 0.30 0.51 -0.38 -0.30 -0.41 1.00 0.61 -0.10 0.36 0.06
## gt 0.65 0.75 -0.59 -0.45 -0.63 0.61 1.00 0.20 0.64 0.38
## pi 0.06 0.03 -0.32 -0.45 -0.40 -0.10 0.20 1.00 0.62 0.92
## sei 0.34 0.37 -0.54 -0.54 -0.64 0.36 0.64 0.62 1.00 0.79
## odi 0.13 0.15 -0.48 -0.58 -0.59 0.06 0.38 0.92 0.79 1.00
head(mammalsleep) #first six rows
## species bw brw sws ps ts mls gt pi sei odi
## 1 African elephant 6654.000 5712.0 NA NA 3.3 38.6 645 3 5 3
## 2 African giant pouched rat 1.000 6.6 6.3 2.0 8.3 4.5 42 3 1 3
## 3 Arctic Fox 3.385 44.5 NA NA 12.5 14.0 60 1 1 1
## 4 Arctic ground squirrel 0.920 5.7 NA NA 16.5 NA 25 5 2 3
## 5 Asian elephant 2547.000 4603.0 2.1 1.8 3.9 69.0 624 3 5 4
## 6 Baboon 10.550 179.5 9.1 0.7 9.8 27.0 180 4 4 4
tail(mammalsleep) #last six rows
## species bw brw sws ps ts mls gt pi sei odi
## 57 Tenrec 0.900 2.6 11.0 2.3 13.3 4.5 60 2 1 2
## 58 Tree hyrax 2.000 12.3 4.9 0.5 5.4 7.5 200 3 1 3
## 59 Tree shrew 0.104 2.5 13.2 2.6 15.8 2.3 46 3 2 2
## 60 Vervet 4.190 58.0 9.7 0.6 10.3 24.0 210 4 3 4
## 61 Water opossum 3.500 3.9 12.8 6.6 19.4 3.0 14 2 1 1
## 62 Yellow-bellied marmot 4.050 17.0 NA NA NA 13.0 38 3 1 1
?mammalsleep # the help
Note that the sleepdata dataset is automatically recognized as a dataframe. There is one factor (categorical variable) containing the animal names.
The functions head()
and tail()
are very
useful functions. As is function str()
as it gives you a
quick overview of the measurement levels in
mammalsleep
.
Since mammalsleep
is an R
-data set, there
should be a help file. Taking a look at ?mammalsleep
provides information about the measurements and origin of the
variables.
One thing that may have caught your attention is the relation between
ts
, ps
and sws
. This is a
deterministic relation where total sleep (ts
) is the sum of
paradoxical sleep (ps
) and short-wave sleep
(sws
). In the event that you would model the data, you need
to take such relations into account.
Practical_C.RData
. Also, save the sleepdata file as a
separate workspace called Sleepdata.RData
. Now that we have imported our data, it may be wise to save the current workspace, i.e. the current state of affairs. Saving the workspace will leave everything as is, so that we can continue from this exact state at a later time, by simply opening the workspace file. To save everything in the current workspace, type:
# To save the entire workspace:
save.image("Practical_C.RData")
To save just the data set sleepdata
, and nothing else,
type:
# To save the data set only.
save(sleepdata, file = "Sleepdata.RData")
With the save functions, any object in the workspace can be saved.
sleepdata2
. Tip: use the square brackets to
indicate [rows, columns] or use the function filter()
from
dplyr
.There are three ways to exclude the three animals from the data set. The first approach uses the names:
exclude <- c("Echidna", "Lesser short-tailed shrew", "Musk shrew")
which <- sleepdata$species %in% exclude #Indicate the species that match the names in exclude
which
## [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [13] FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE
## [37] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [49] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [61] FALSE FALSE
sleepdata2 <- sleepdata[!which, ]
and the second approach uses the row numbers directly (you would need to inquire about, or calculate the row numbers)
sleepdata2 <- sleepdata[-c(16, 32, 38), ]
Note that the numbered option requires less code, but the named option has a much lower probability for error. As the data set might change, or might get sorted differently, the second option may not be valid anymore.
The third approach uses function filter()
from package
dplyr
:
library(dplyr) # Data Manipulation
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
dplyr::filter(sleepdata, !sleepdata$species %in% exclude) # ! makes all TRUES into FALSE
## species bw brw sws ps ts mls gt pi sei
## 1 African elephant 6654.000 5712.00 NA NA 3.3 38.6 645 3 5
## 2 African giant pouched rat 1.000 6.60 6.3 2.0 8.3 4.5 42 3 1
## 3 Arctic Fox 3.385 44.50 NA NA 12.5 14.0 60 1 1
## 4 Arctic ground squirrel 0.920 5.70 NA NA 16.5 NA 25 5 2
## 5 Asian elephant 2547.000 4603.00 2.1 1.8 3.9 69.0 624 3 5
## 6 Baboon 10.550 179.50 9.1 0.7 9.8 27.0 180 4 4
## 7 Big brown bat 0.023 0.30 15.8 3.9 19.7 19.0 35 1 1
## 8 Brazilian tapir 160.000 169.00 5.2 1.0 6.2 30.4 392 4 5
## 9 Cat 3.300 25.60 10.9 3.6 14.5 28.0 63 1 2
## 10 Chimpanzee 52.160 440.00 8.3 1.4 9.7 50.0 230 1 1
## 11 Chinchilla 0.425 6.40 11.0 1.5 12.5 7.0 112 5 4
## 12 Cow 465.000 423.00 3.2 0.7 3.9 30.0 281 5 5
## 13 Desert hedgehog 0.550 2.40 7.6 2.7 10.3 NA NA 2 1
## 14 Donkey 187.100 419.00 NA NA 3.1 40.0 365 5 5
## 15 Eastern American mole 0.075 1.20 6.3 2.1 8.4 3.5 42 1 1
## 16 European hedgehog 0.785 3.50 6.6 4.1 10.7 6.0 42 2 2
## 17 Galago 0.200 5.00 9.5 1.2 10.7 10.4 120 2 2
## 18 Genet 1.410 17.50 4.8 1.3 6.1 34.0 NA 1 2
## 19 Giant armadillo 60.000 81.00 12.0 6.1 18.1 7.0 NA 1 1
## 20 Giraffe 529.000 680.00 NA 0.3 NA 28.0 400 5 5
## 21 Goat 27.660 115.00 3.3 0.5 3.8 20.0 148 5 5
## 22 Golden hamster 0.120 1.00 11.0 3.4 14.4 3.9 16 3 1
## 23 Gorilla 207.000 406.00 NA NA 12.0 39.3 252 1 4
## 24 Gray seal 85.000 325.00 4.7 1.5 6.2 41.0 310 1 3
## 25 Gray wolf 36.330 119.50 NA NA 13.0 16.2 63 1 1
## 26 Ground squirrel 0.101 4.00 10.4 3.4 13.8 9.0 28 5 1
## 27 Guinea pig 1.040 5.50 7.4 0.8 8.2 7.6 68 5 3
## 28 Horse 521.000 655.00 2.1 0.8 2.9 46.0 336 5 5
## 29 Jaguar 100.000 157.00 NA NA 10.8 22.4 100 1 1
## 30 Kangaroo 35.000 56.00 NA NA NA 16.3 33 3 5
## 31 Little brown bat 0.010 0.25 17.9 2.0 19.9 24.0 50 1 1
## 32 Man 62.000 1320.00 6.1 1.9 8.0 100.0 267 1 1
## 33 Mole rat 0.122 3.00 8.2 2.4 10.6 NA 30 2 1
## 34 Mountain beaver 1.350 8.10 8.4 2.8 11.2 NA 45 3 1
## 35 Mouse 0.023 0.40 11.9 1.3 13.2 3.2 19 4 1
## 36 N. American opossum 1.700 6.30 13.8 5.6 19.4 5.0 12 2 1
## 37 Nine-banded armadillo 3.500 10.80 14.3 3.1 17.4 6.5 120 2 1
## 38 Okapi 250.000 490.00 NA 1.0 NA 23.6 440 5 5
## 39 Owl monkey 0.480 15.50 15.2 1.8 17.0 12.0 140 2 2
## 40 Patas monkey 10.000 115.00 10.0 0.9 10.9 20.2 170 4 4
## 41 Phanlanger 1.620 11.40 11.9 1.8 13.7 13.0 17 2 1
## 42 Pig 192.000 180.00 6.5 1.9 8.4 27.0 115 4 4
## 43 Rabbit 2.500 12.10 7.5 0.9 8.4 18.0 31 5 5
## 44 Raccoon 4.288 39.20 NA NA 12.5 13.7 63 2 2
## 45 Rat 0.280 1.90 10.6 2.6 13.2 4.7 21 3 1
## 46 Red fox 4.235 50.40 7.4 2.4 9.8 9.8 52 1 1
## 47 Rhesus monkey 6.800 179.00 8.4 1.2 9.6 29.0 164 2 3
## 48 Rock hyrax (Hetero. b) 0.750 12.30 5.7 0.9 6.6 7.0 225 2 2
## 49 Rock hyrax (Procavia hab) 3.600 21.00 4.9 0.5 5.4 6.0 225 3 2
## 50 Roe deer 14.830 98.20 NA NA 2.6 17.0 150 5 5
## 51 Sheep 55.500 175.00 3.2 0.6 3.8 20.0 151 5 5
## 52 Slow loris 1.400 12.50 NA NA 11.0 12.7 90 2 2
## 53 Star nosed mole 0.060 1.00 8.1 2.2 10.3 3.5 NA 3 1
## 54 Tenrec 0.900 2.60 11.0 2.3 13.3 4.5 60 2 1
## 55 Tree hyrax 2.000 12.30 4.9 0.5 5.4 7.5 200 3 1
## 56 Tree shrew 0.104 2.50 13.2 2.6 15.8 2.3 46 3 2
## 57 Vervet 4.190 58.00 9.7 0.6 10.3 24.0 210 4 3
## 58 Water opossum 3.500 3.90 12.8 6.6 19.4 3.0 14 2 1
## 59 Yellow-bellied marmot 4.050 17.00 NA NA NA 13.0 38 3 1
## odi
## 1 3
## 2 3
## 3 1
## 4 3
## 5 4
## 6 4
## 7 1
## 8 4
## 9 1
## 10 1
## 11 4
## 12 5
## 13 2
## 14 5
## 15 1
## 16 2
## 17 2
## 18 1
## 19 1
## 20 5
## 21 5
## 22 2
## 23 1
## 24 1
## 25 1
## 26 3
## 27 4
## 28 5
## 29 1
## 30 4
## 31 1
## 32 1
## 33 1
## 34 3
## 35 3
## 36 1
## 37 1
## 38 5
## 39 2
## 40 4
## 41 2
## 42 4
## 43 5
## 44 2
## 45 3
## 46 1
## 47 2
## 48 2
## 49 3
## 50 5
## 51 5
## 52 2
## 53 2
## 54 2
## 55 3
## 56 2
## 57 4
## 58 1
## 59 1
plot(brw ~ species, data = sleepdata2)
To find out which animals have a brain weight larger than 1 standard deviation above the mean brain weight:
sd.brw <- sd(sleepdata2$brw) # standard deviation
mean.brw <- mean(sleepdata2$brw) # mean
which <- sleepdata2$brw > (mean.brw + (1 * sd.brw)) # which are larger?
as.character(sleepdata2$species[which]) # names of the animals with brw > 1000
## [1] "African elephant" "Asian elephant" "Man"
To plot these animals:
plot(brw ~ species, data = sleepdata2[which, ])
The downside is that it still prints all the animals on the x-axis.
This is due to the factor labels for species
being copied
to the smaller subset of the data. Plot automatically takes over the
labels. For example,
sleepdata2$species[which]
## [1] African elephant Asian elephant Man
## 62 Levels: African elephant African giant pouched rat ... Yellow-bellied marmot
returns only 3 mammals, but still has 62 factor levels. To get rid of
the unused factor levels, we can use function factor()
:
sleepdata3 <- sleepdata2[which, ]
sleepdata3$species <- factor(sleepdata3$species)
sleepdata3$species
## [1] African elephant Asian elephant Man
## Levels: African elephant Asian elephant Man
To plot the graph that we wanted:
plot(brw ~ species, data = sleepdata3)
If your current software-analysis platform is different from
R
, chances are that you prepare your data in the software
of your choice. In R
there are fantastic facilities for
importing and exporting data and I would specifically like to pinpoint
you to package haven
by
Hadley Wickham. It provides wonderful
functions to import and export many data types from software such as
Stata, SAS and SPSS. For integrating Mplus into R
, package
MplusAutomation
is essential.