10.4 Irrelevant features

Predictability

  • type of model
  • nature of the predictors
  • ratio of the size of the training set to the number of predictors

This simulation system from Sapp et al. (2014) is an example of nonlinear function of 20 predictors:

\[y=x_1+sin(x_2)+log(|x_3|)+x_4^2+x_5x_6+I(x_7x_8x_9<0)+I(x_{10}>0)+\\ x_{11}I(x_{1}1>0)+\sqrt{|x_{12}|}+cos(x_{13})+2x_{14}+|x_{15}|+\\I(x_{16}< -1)+I(x_{17< -1})-2x_{18}-x_{19}x_{20}+\epsilon\]

Each of the \(x_i\) are generated from an independent standard normal random variable and the \(\epsilon\), the error as a random normal \(\epsilon\sim N(0,3)\).

And between 10 and 200 extra variables are added.

source: FES Selection Simulation

RMSE trends for different models and simulation configurations

Figure 10.2: RMSE trends for different models and simulation configurations