9.7 - Final thoughts
Decision trees have a number of advantages:
They require very little pre-processing.
Can easily handle categorical features without preprocessing.
Missing values can be handled by decision trees by creating a new “missing” class for categorical variables or using surrogate splits (see Therneau, Atkinson, and others (1997) for details).
However, individual decision trees generally do not often achieve state-of-the-art predictive accuracy.
Furthermore, we saw that deep trees tend to have high variance (and low bias) and shallow trees tend to be overly bias (but low variance).