6.1 Create a Model

6.1.1 Different Model Interfaces

Different-interfaces


  • Model Interfaces
    • Different Implementations = Different Interfaces
    • Linear Regression can be implemented in many ways
      • Ordinary Least Squares
      • Regularized Linear Regression


  • {stats}
    • takes formula
    • uses data.frame
lm(formula, data, ...)


  • {glmnet}
    • Has x/y interface
    • Uses a matrix
glmnet(x = matrix, y = vector, family = "gaussian", ...)



6.1.2 Model Specification

model specification

  • {tidymodels}/{parsnip} - Philosophy is to unify & make interfaces more predictable.
    • Specify model type (e.g. linear regression, random forest …)
      • linear_reg()
      • rand_forest()
    • Specify engine (i.e. package implementation of algorithm)
      • set_engine("some package's implementation")
    • declare mode (e.g. classification vs linear regression)
      • use this when model can do both classification & regression
      • set_mode("regression")
      • set_mode("classification")


  • Bringing it all together
lm_model_spec <- 
  linear_reg() %>% # specify model
  set_engine("lm") # set engine


lm_model_spec
## Linear Regression Model Specification (regression)
## 
## Computational engine: lm



6.1.3 Model Fitting

From above we will use our existing model specification


  • fit()
    • any nominal or categorical variables will be split out into dummy columns
    • most formula methods also turn do the same thing
  • fit_xy
    • delays creating dummy variable and has underlying model function
# create model fit using formula
lm_form_fit <- 
  lm_model_spec %>% 
  fit(Sale_Price ~ Longitude + Latitude, data = ames_train)


# create model fit using x/y
lm_xy_fit <- 
  lm_model_spec %>% 
  fit_xy(
    x = ames_train %>% select(Longitude, Latitude),
    y = ames_train %>% pull(Sale_Price)
    )



6.1.4 Generalized Model Arguments

  • Like the varying interfaces, model parameters differ from implementation to implementation
  • two level of model arguments
    • main arguments - Parameters aligned with the mathematical vehicle
    • engine arguments - Parameters aligned with the package implementation of the mathematical algorithm
argument ranger randomForest sparklyr
sampled predictors mtry mtry feature_subset_strategy
trees num.tress ntree num_trees
data points to split min.node.size nodesize min_instances_per_node


argument parsnip
sampled predictors mtry
trees trees
data points to split min_n


Parsnip in Action


  • The translate() provides the mapping from the parsnips interface to the each individual package’s implementation of the algorithm.
# stats implementation
linear_reg() %>% 
  set_engine("lm") %>% 
  translate()
## Linear Regression Model Specification (regression)
## 
## Computational engine: lm 
## 
## Model fit template:
## stats::lm(formula = missing_arg(), data = missing_arg(), weights = missing_arg())
# glmnet implementation
linear_reg(penalty = 1) %>% 
  set_engine("glmnet") %>% 
  translate()
## Linear Regression Model Specification (regression)
## 
## Main Arguments:
##   penalty = 1
## 
## Computational engine: glmnet 
## 
## Model fit template:
## glmnet::glmnet(x = missing_arg(), y = missing_arg(), weights = missing_arg(), 
##     family = "gaussian")