5.8 Summary

Data splitting is an important part of a modeling workflow as it impacts model validity and performance. The most common splitting technique is random splitting. Some data, such as time-series or multi-level data require a different data splitting technique called stratified sampling. The rsample package contains many functions that can perform random splitting and stratified splitting.

We will learn more about how to remedy certain issues such as class imbalance, bias and overfitting in Chapter 10.

5.8.1 References