10.9 Useful Examples - Histograms and binwidth

Useful when…

  • You need to pass a function
  • You don’t want to have to re-write the function every time (the default behaviour of the function should be flexible)

For example, these bins are not appropriate

sd <- c(1, 5, 15)
n <- 100
df <- data.frame(x = rnorm(3 * n, sd = sd), sd = rep(sd, n))

ggplot(df, aes(x)) + 
  geom_histogram(binwidth = 2) + 
  facet_wrap(~ sd, scales = "free_x") + 
  labs(x = NULL)

We could just make a function…

binwidth_bins <- function(x) (max(x) - min(x)) / 20

ggplot(df, aes(x = x)) + 
  geom_histogram(binwidth = binwidth_bins) + 
  facet_wrap(~ sd, scales = "free_x") + 
  labs(x = NULL)

But if we want to change the number of bins (20) we’d have to re-write the function each time.

If we use a factory, we don’t have to do that.

binwidth_bins <- function(n) {
  force(n)
  function(x) (max(x) - min(x)) / n
}

ggplot(df, aes(x = x)) + 
  geom_histogram(binwidth = binwidth_bins(20)) + 
  facet_wrap(~ sd, scales = "free_x") + 
  labs(x = NULL, title = "20 bins")


ggplot(df, aes(x = x)) + 
  geom_histogram(binwidth = binwidth_bins(5)) + 
  facet_wrap(~ sd, scales = "free_x") + 
  labs(x = NULL, title = "5 bins")

Similar benefit in Box-cox example