8.15 Bagging

  • Also known as bootstrap aggregation is a general-purpose procedure for reducing the variance of a statistical learning method

  • It’s useful and frequently used in the context of decision trees

  • Recall that given a set of n independent observations Z1,...,Zn, each with variance σ2, the variance of the mean ˉZ of the observations is given by σ2/n

  • So, averaging a set of observations reduces variance

  • But, this is not practical because we generally do not have access to multiple training sets!

  • What to do?!