1.3 Estimates of Variability

  • Variability (aka dispersion) = are values clustered or spread out?

1.3.1 SD & Friends

  • Variance = average of squared deviations, \(s^2 = \frac{\sum_{i=1}^{n}{(x_{1}-\bar{x})^2}}{n-1}\)
s_squared <- var(dataset)
s_squared
## [1] 12.5
  • Standard deviation = square root of variance, \(s = \sqrt{variance}\)
s <- sd(dataset)
s
## [1] 3.535534
s == sqrt(s_squared)
## [1] TRUE
  • Median absolute deviation from the median (MAD) is robust to outliers.
mad(dataset)
## [1] 1.4826

Wait, why did that return the standard scale factor?

  • dataset is c(1, 2, 3, 4, 10)
  • The difference between any 2 values is 1 (except the outlier)
  • 1 * 1.4826 = 1.4826

1.3.2 Percentiles & Friends

  • Percentiles = quantiles, \(P\%\) of values are \(<= x\)
x <- sample(1:100, 100, replace = TRUE)
y <- rnorm(100, mean = 50, sd = 20)
quantile(x, probs = seq(0, 1, 0.1))
##    0%   10%   20%   30%   40%   50%   60%   70%   80%   90%  100% 
##   2.0  13.0  21.0  27.0  38.0  45.5  55.8  66.3  81.6  96.0 100.0
quantile(y, probs = seq(0, 1, 0.1))
##         0%        10%        20%        30%        40%        50%        60% 
##  -1.830337  29.526910  36.006697  40.274329  44.657078  48.180893  54.405362 
##        70%        80%        90%       100% 
##  58.980133  65.088256  73.732659 100.371937
quantile(x) # quartile
##     0%    25%    50%    75%   100% 
##   2.00  25.00  45.50  73.25 100.00
IQR(x) # They introduce this later but I like it here.
## [1] 48.25