- Drawing data from distributions
- Sampling, random seed, pseudo-random number generator
Statistical Programming with R
library(mvtnorm) # Multivariate Normal tools
table()
: Build a contingency table out of countshist()
: Draw a histogrammvtnorm::rmvnorm()
: Drawing values from a multivariate normal distributionsample()
: Probabilistic sampling with or without replacementrchisq()
: Probabilistic sampling from a \(\chi^2\) distributionrpois()
: Probabilistic sampling from a poisson distributionset.seed(123)
: Fixing the Random Number Generator seedWe saw that we could create vectors
c(1, 2, 3, 4, 3, 2, 1)
## [1] 1 2 3 4 3 2 1
c(1:4, 3:1)
## [1] 1 2 3 4 3 2 1
We could create matrices
matrix(1:15, nrow = 3, ncol = 5, byrow = TRUE)
## [,1] [,2] [,3] [,4] [,5] ## [1,] 1 2 3 4 5 ## [2,] 6 7 8 9 10 ## [3,] 11 12 13 14 15
Create a vector from a matrix:
mat <- matrix(data=1:18, nrow=3, ncol=6) mat
## [,1] [,2] [,3] [,4] [,5] [,6] ## [1,] 1 4 7 10 13 16 ## [2,] 2 5 8 11 14 17 ## [3,] 3 6 9 12 15 18
c(mat)
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Of the same size,
sample(x=mat)
## [1] 15 7 16 11 4 13 9 6 8 10 3 12 17 5 18 14 2 1
# Besides x=mat, no other arguments are specified, therefore the function uses # the defaults: # If size is not specified, size = number of elements in x # replace: default value is FALSE
smaller size,
sample(mat, size = 5)
## [1] 15 5 9 18 16
or larger size
sample(mat, size = 50, replace = TRUE)
## [1] 10 15 13 3 16 14 14 10 14 8 12 7 12 10 18 1 11 7 3 6 17 9 13 8 6 ## [26] 8 14 15 15 2 1 16 12 1 17 8 11 4 11 1 4 11 11 1 10 17 4 6 13 7
hist(sample(mat, size = 50000, replace=TRUE), breaks = 0:18)
probs <- c(rep(.01, 15), .1, .25, .50) probs
## [1] 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 ## [16] 0.10 0.25 0.50
hist(sample(mat, size = 50000, prob = probs, replace=TRUE), breaks = 0:18)
set.seed(123) sample(mat, size = 50, replace = TRUE)
## [1] 15 14 3 10 18 11 5 14 5 9 3 8 7 10 9 4 14 17 11 7 12 15 10 13 7 ## [26] 9 9 10 7 6 2 5 8 12 13 18 1 6 15 9 15 16 6 11 8 7 16 17 18 17
set.seed(123) sample(mat, size = 50, replace = TRUE)
## [1] 15 14 3 10 18 11 5 14 5 9 3 8 7 10 9 4 14 17 11 7 12 15 10 13 7 ## [26] 9 9 10 7 6 2 5 8 12 13 18 1 6 15 9 15 16 6 11 8 7 16 17 18 17
set.seed(123) sample(mat, size = 5, replace = TRUE)
## [1] 15 14 3 10 18
sample(mat, size = 7, replace = TRUE)
## [1] 11 5 14 5 9 3 8
set.seed(123) sample(mat, size = 5, replace = TRUE)
## [1] 15 14 3 10 18
sample(mat, size = 7, replace = TRUE)
## [1] 11 5 14 5 9 3 8
The random seed is a number used to initialize the pseudo-random number generator
If replication is needed, pseudo-random number generators must be used
For truly random numbers, visit https://www.random.org/ (random numbers are generated based on atmosperic noise)
The initial value (the seed) itself does not need to be random.
When an R
instance is started, there is initially no seed. In that case, R
will create one from the current time and process ID.
If we fix the random seed we can exactly replicate the random process
If the method has not changed:
the results of the process will be identical when using the same seed.
We can draw data from a standard normal distribution
hist(rnorm(1000, mean = 0, sd = 1))
We can draw data from a specific normal distribution
hist(rnorm(1000, mean = 50, sd = 15))
hist(rnorm(100000, mean = 50, sd = 15))
\(\chi^2\)-distribution with \(df = 1\) degrees of freedom
hist(rchisq(500, df = 1))
\(\chi^2\)-distribution with \(df = 100\) degrees of freedom
hist(rchisq(500, df = 100))
Poisson distribution with mean \(\lambda = 1\)
hist(rpois(5000, lambda = 1))
Poisson distribution with mean \(\lambda = 10\)
hist(rpois(5000, lambda = 100))
Binomial distribution: 1000 draws of 1 trial with \(P(\text{success}=.75)\)
run <- rbinom(1000, size = 1, prob = .75) table(run)
## run ## 0 1 ## 247 753
Binomial distribution: 10.000 draws of 60 trials with \(P(\text{success}=.75)\)
run <- rbinom(10000, size = 60, prob = .75) table(run)
## run ## 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 ## 1 7 10 21 29 67 137 257 395 615 783 981 1096 1195 1136 1053 ## 48 49 50 51 52 53 54 55 56 ## 836 572 376 200 136 63 25 7 2
sigma <- matrix(c(30, 15, 15, 30), 2, 2) sigma
## [,1] [,2] ## [1,] 30 15 ## [2,] 15 30
means <- c(97, 63) data <- rmvnorm(n = 1000, mean = means, sigma = sigma) colMeans(data)
## [1] 96.86905 62.77418
var(data)
## [,1] [,2] ## [1,] 26.98297 14.28588 ## [2,] 14.28588 31.07523
plot(data)
sigma <- matrix(c(1, 0, 0, 1), 2, 2) sigma
## [,1] [,2] ## [1,] 1 0 ## [2,] 0 1
means <- c(0, 0) data <- rmvnorm(n = 1000, mean = means, sigma = sigma) colMeans(data)
## [1] -0.004211069 0.007496766
var(data)
## [,1] [,2] ## [1,] 0.869919122 -0.002239893 ## [2,] -0.002239893 1.074986477
plot(data)
par(mfrow = c(1, 2)) hist(data[, 1]) hist(data[, 2])
Practical B Solutions
file to check the answers.Practical B Template
file to write the code.