9.1 - Introduction

Tree-based models are a class of nonparametric algorithms that work by partitioning the feature space into a number of smaller (non-overlapping) regions with similar response values using a set of splitting rules.

Advantages:

  • Handling of nonlinear relationships.

  • Interpretability: decision trees provide easily understandable rules that can be visualized through tree diagrams.

  • Robust to outliers.

Disadvantages:

  • Overfitting: especially when the tree grows deep and complex.

  • Handling continuous variables: not as effective in handling continuous variables with a wide range of values.

  • Instability: decision trees can be sensitive to small changes in the data, leading to different tree structures or predictions.

  • Lack of Predictive Power: compared to more complex algorithms like neural networks or gradient boosting machines, decision trees may have lower predictive performance.

It’s important to note that some of these disadvantages can be mitigated through techniques like pruning, ensemble methods (e.g., random forests), or using gradient boosting algorithms that combine multiple decision trees.