1.2 Estimates of Location

  • Most basic = mean.
dataset <- c(3, 4, 1, 2, 10)
mean(dataset) # (3 + 4 + 1 + 2 + 10)/5 = 20/5
## [1] 4
  • Trimming helps eliminate outliers
mean(dataset, trim = 1/5) # (2 + 3 + 4)/3 = 9/3
## [1] 3
  • Weight to:
    • Down-weight high-variability values.
    • Up-weight under-represented values.
weights <- c(1, 1, 11, 1, 1)
weighted.mean(dataset, weights) # (3 + 4 + 11 + 2 + 10)/15 = 30/15
## [1] 2
  • Median: sort then choose middle value.
median(dataset) # 1, 2, (3), 4, 10
## [1] 3
  • Weighted median: similar to weighted mean, but more complicated.
# Sort then weight then middle of weight. 1*11, 2*1, 3*1, 4*1, 10*1
matrixStats::weightedMedian(dataset, weights)
## [1] 1.333333
  • Technically it interpolates in-between values.
matrixStats::weightedMedian(dataset, weights, interpolate = TRUE)
## [1] 1.333333
  • Can tell it not to interpolate to simplify.
matrixStats::weightedMedian(dataset, weights, interpolate = FALSE)
## [1] 1
# Equivalent to repeating values weight times.
median(c(rep(1, 11), 2, 3, 4, 10))
## [1] 1