The previous method may result in a tree that overfits the data. Why?
 
Tree is too leafy (complex)
 
A better strategy is to have a smaller tree with fewer splits, which will reduce variance and lead to better interpretation of results (at the cost of a little bias)
 
So we will prune