1.3 Estimates of Variability
- Variability (aka dispersion) = are values clustered or spread out?
1.3.1 SD & Friends
- Variance = average of squared deviations, \(s^2 = \frac{\sum_{i=1}^{n}{(x_{1}-\bar{x})^2}}{n-1}\)
## [1] 12.5
- Standard deviation = square root of variance, \(s = \sqrt{variance}\)
## [1] 3.535534
## [1] TRUE
- Median absolute deviation from the median (MAD) is robust to outliers.
## [1] 1.4826
Wait, why did that return the standard scale factor?
dataset
isc(1, 2, 3, 4, 10)
- The difference between any 2 values is
1
(except the outlier) 1 * 1.4826 = 1.4826
1.3.2 Percentiles & Friends
- Percentiles = quantiles, \(P\%\) of values are \(<= x\)
x <- sample(1:100, 100, replace = TRUE)
y <- rnorm(100, mean = 50, sd = 20)
quantile(x, probs = seq(0, 1, 0.1))
## 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
## 2.0 13.0 21.0 27.0 38.0 45.5 55.8 66.3 81.6 96.0 100.0
## 0% 10% 20% 30% 40% 50% 60%
## -1.830337 29.526910 36.006697 40.274329 44.657078 48.180893 54.405362
## 70% 80% 90% 100%
## 58.980133 65.088256 73.732659 100.371937
## 0% 25% 50% 75% 100%
## 2.00 25.00 45.50 73.25 100.00
## [1] 48.25