22.7 Nonparametric regression and machine learning

  • Nonparamentric regression - the regression curve is not constrained to follow any particular parametric form.

  • Machine Learning describes nonparametric regression where the focus is more on prediction rather then parameter estimation. Performance is often assessed on held out ‘test’ data.

  • To avoid overfitting, nonparametric models use a variety of techniques to constrain the model and Tuning parameters (hyperparameters) govern the amount of constraint, typically optimized using cross-validation.

  • Some examples of nonparametric models:

    • Loess - locally weighted regression, tuned by the stength of the weight function.

    • Splines - nonlinear basis functions , tuning controls the ‘local smoothness’

    • Gaussian processes - multivariate Gaussian model, tuning controls the correlation distance

    • Tree models - Decision trees are very powerful nonparametric models, especially gradient boosted trees (e.g. XGBoost).

    • BART - Bayesian additive regression trees. Somehow includes priors over trees and leaf values. For more, ROS recommends: Bayesian Additive Regression Trees: A Review and Look Forward

  • Many of these methods are covered or at least introduced in Introduction to Statistical Learning