10.4 Irrelevant features
Predictability
- type of model
- nature of the predictors
- ratio of the size of the training set to the number of predictors
This simulation system from Sapp et al. (2014) is an example of nonlinear function of 20 predictors:
y=x1+sin(x2)+log(|x3|)+x24+x5x6+I(x7x8x9<0)+I(x10>0)+x11I(x11>0)+√|x12|+cos(x13)+2x14+|x15|+I(x16<−1)+I(x17<−1)−2x18−x19x20+ϵ
Each of the xi are generated from an independent standard normal random variable and the ϵ, the error as a random normal ϵ∼N(0,3).
And between 10 and 200 extra variables are added.
source: FES Selection Simulation

Figure 10.2: RMSE trends for different models and simulation configurations