Probability distributions in
R
A review of statistical inference concepts
Statistical Programming with R
Probability distributions in R
A review of statistical inference concepts
Normal distributions
Discrete distributions (counts)
Other continuous distributions
R has a system to work with probability distributions. For example the normal distribution:
dnorm()
: d for density function
pnorm()
: p for cumulative distribution \(P(X \leqslant x)\)
qnorm()
: q for p-quantile
rnorm()
: r for pseudo-random normally distributed numbers
For other distributions, replace norm
with: f, chisq, binom, pois, exp, unif, gamma, beta, t
Example: obtain critical z-value for a 2-sided 95% confidence interval:
qnorm(0.975) # default: mean=0, sd=1
## [1] 1.959964
Random sample
Population parameter
Sample statistic
The sampling distribution is the distribution of a sample statistic
Conditions for the CLT to hold
Statistical tests work with a null hypothesis, e.g.: \[ H_0:\mu=\mu_0 \]
\[ t_{(df)}=\frac{\bar{x}-\mu_0}{SEM} \]
The \(t\)-distribution has larger variance than the normal distribution (more uncertainty).
curve(dt(x, 100), -3, 3, ylab = "density") curve(dnorm(x), -3, 3, ylab = "", add = TRUE, col = "orange", lty=2) curve(dt(x, 2), -3, 3, ylab = "", add = TRUE, col = "red") curve(dt(x, 1), -3, 3, ylab = "", add = TRUE, col = "blue") legend(1.8, .4, c("t(df=100)", "normal", "t(df=2)", "t(df=1)"), col = c("black", "orange", "red", "blue"), lty=1)
The \(p\)-value in this situation is the probability that \(\bar{x}\) is at least that much different from \(\mu_0\):
We would reject \(H_0\) if \(p\) is smaller than the experimenters’ predetermined significance level \(\alpha\):
Example of two-sided test for \(t_{(df=10)}\) given that \(P(t<-2.228)=5\%\) (\(\alpha\) = 0.05)
If an infinite number of samples were drawn and CI’s computed, then the true population mean \(\mu\) would be in at least 95% of these intervals
\[ 95\%~CI=\bar{x}\pm{t}_{(1-\alpha/2)}\cdot SEM \]
Example
x.bar <- 7.6 # sample mean SEM <- 2.1 # standard error of the mean n <- 11 # sample size df <- n-1 # degrees of freedom alpha <- .15 # significance level t.crit <- qt(1 - alpha / 2, df) # t(1 - alpha / 2) for df = 10 c(x.bar - t.crit * SEM, x.bar + t.crit * SEM)
## [1] 4.325605 10.874395
To illustrate the concept of a confidence, see the following interactive applications:
A Shiny app: https://shiny.rit.albany.edu/stat/confidence/
An interactive visualisation: https://rpsychologist.com/d3/ci/
Both applications show the probability process of repeatedly estimating a 95% confidence interval. The interpretation of a 95% CI becomes more clear when you see the results of estimating 100 times a 95% CI for a sample mean.
During the practical you can work with a Markdown file that simulates the estimation of 100 confidence intervals for the mean.
Exercises
#
to start a comment).Run All
option in the Run
menu (upper right in the editor pane). Run the code again, but first change the seed at line 21. What do you observe?