Open the “Solutions” file and download the “Template” file (the R Markdown file).
Start with reading the “Solutions” file and try to understand what happens in each code chunk. If you do not understand what the code does, use the following possible strategies:
#
to start a comment)Run All
option in
the Run
menu (upper right in the editor pane). Run the code
again, but first change the seed at line 21. What do you observe? How
many confidence intervals cover the population mean value = 0?We start by fixing the random seed.
set.seed(123)
Then, we draw 100 samples from a standard normal distribution, i.e. a distribution with \(\mu=0\) and \(\sigma^2=1\), such that for any drawn sample X we could write \(X ∼ \mathcal{N}(0,1)\). No specification about the size of the samples is explicitly requested, so for computational reasons a mere 5000 cases would suffice to obtain a detailed approximation.
library(plyr)
samples <- plyr::rlply(.n = 100, rnorm(n = 5000, mean = 0, sd = 1))
We use the plyr::rlply()
function to draw the 100
samples and return the resulting output as a list.
Find the help file of rlply()
and
rnorm()
and see if you understand what happens in the above
code chunk.
Further, we extract the following sources of information for each sample:
info <- function(x){
M <- mean(x)
DF <- length(x) - 1
SE <- 1 / sqrt(length(x))
INT <- qt(.975, DF) * SE
return(c(M, M - 0, SE, M - INT, M + INT))
}
format <- c("Mean" = 0, "Bias" = 0, "Std.Err" = 0, "Lower" = 0, "Upper" = 0)
We can then proceed by creating a piped process. The following code
is written with the pipe functionality from package magrittr - FYI:
dplyr
adopted magrittr
, so dplyr
would also work here.
require(magrittr)
results <- samples %>%
vapply(., info, format) %>%
t()
Because object samples
is a list, we can execute
function vapply()
to obtain a numerical object with the
results of function info()
. vapply()
allows
you to return output to a pre-defined format. Function
t()
is used to obtain the transpose of
vapply()
’s return - the resulting object has all
information in the columns.
To create an indicator for the inclusion of the population value \(\mu=0\) in the confidence interval, we can add the following coverage column cov to the data:
results <- results %>%
as.data.frame() %>%
mutate(Covered = Lower < 0 & 0 < Upper)
Converting the numerical object to an object of class data.frame
allows for a more convenient calling of elements. Now we can simply take
the column means over the data frame results to obtain the average of
the estimates returned by info()
.
colMeans(results)
## Mean Bias Std.Err Lower Upper
## 5.619577e-05 5.619577e-05 1.414214e-02 -2.766859e-02 2.778098e-02
## Covered
## 9.400000e-01
We can see that 94 out of the 100 samples cover the population value.
To identify the samples for which the population value is not
covered, we can use column cov
as it is already a logical
evaluation.
table <- results[!results$Covered, ]
To present this info as a table, package kableExtra
is a
wonderful extension to use with both rmarkdown and LATEX.
require(knitr)
require(kableExtra)
kable(table)
Mean | Bias | Std.Err | Lower | Upper | Covered | |
---|---|---|---|---|---|---|
35 | 0.0305931 | 0.0305931 | 0.0141421 | 0.0028683 | 0.0583179 | FALSE |
52 | -0.0314772 | -0.0314772 | 0.0141421 | -0.0592020 | -0.0037524 | FALSE |
56 | 0.0307159 | 0.0307159 | 0.0141421 | 0.0029911 | 0.0584407 | FALSE |
75 | 0.0359120 | 0.0359120 | 0.0141421 | 0.0081872 | 0.0636367 | FALSE |
83 | -0.0284521 | -0.0284521 | 0.0141421 | -0.0561769 | -0.0007273 | FALSE |
95 | -0.0331512 | -0.0331512 | 0.0141421 | -0.0608759 | -0.0054264 | FALSE |
kable(table, "html") %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
full_width = F,
position = "float_right")
Mean | Bias | Std.Err | Lower | Upper | Covered | |
---|---|---|---|---|---|---|
35 | 0.0305931 | 0.0305931 | 0.0141421 | 0.0028683 | 0.0583179 | FALSE |
52 | -0.0314772 | -0.0314772 | 0.0141421 | -0.0592020 | -0.0037524 | FALSE |
56 | 0.0307159 | 0.0307159 | 0.0141421 | 0.0029911 | 0.0584407 | FALSE |
75 | 0.0359120 | 0.0359120 | 0.0141421 | 0.0081872 | 0.0636367 | FALSE |
83 | -0.0284521 | -0.0284521 | 0.0141421 | -0.0561769 | -0.0007273 | FALSE |
95 | -0.0331512 | -0.0331512 | 0.0141421 | -0.0608759 | -0.0054264 | FALSE |
For an even more flexible presentation of tabulated results, the
graphical parameters for kable()
can be changed. For
example, the following code renders a table that does not span the width
of the page, is in a right-aligned floating container and has some
visually pleasing aesthetics, like striping, hovering (mouse pointer)
and is somewhat condensed.
However, you need to pay attention to the final result. Justified floats have a tendency to mess up the ‘natural’ flow of the text, unless the body of text is sufficiently large. For example, this whole paragraph serves that purpose: to create a sufficiently large body of text.
To create a graph that would serve the purpose of the exercise, one could think about the following graph:
require(ggplot2)
limits <- aes(ymax = Upper, ymin = Lower)
ggplot(results, aes(y=Mean, x=1:100, colour = Covered)) +
geom_hline(aes(yintercept = 0), color = "dark grey", linewidth = 2) +
geom_pointrange(limits) +
xlab("Simulations 1-100") +
ylab("Means and 95% Confidence Intervals")