8.15 Bagging

  • Also known as bootstrap aggregation is a general-purpose procedure for reducing the variance of a statistical learning method

  • It’s useful and frequently used in the context of decision trees

  • Recall that given a set of \(n\) independent observations \(Z_1,..., Z_n\), each with variance \(\sigma^2\), the variance of the mean \(\bar{Z}\) of the observations is given by \(\sigma^2/n\)

  • So, averaging a set of observations reduces variance

  • But, this is not practical because we generally do not have access to multiple training sets!

  • What to do?!