13.1 Regular and non-regular grids
Let’s consider an example model: an mlp
neural network model. The parameters marked for tuning are:
the number of hidden units,
the number of fitting epochs in model training, and
the amount of weight decay penalization.
Using parsnip
, the specification for a regression model fit using the nnet
package for a multi layer perceptron is:
<-
mlp_spec mlp(hidden_units = tune(),
penalty = tune(),
epochs = tune()) %>%
set_engine("nnet", trace = 0) %>%
set_mode("regression")
The argument trace = 0 prevents extra logging of the training process. The parameters()
function can extract the set of arguments with unknown values and set their dials
objects. extract_parameter_dials()
gives the current range of values.
<- parameters(mlp_spec)
mlp_param %>% extract_parameter_dials("hidden_units") mlp_param
## # Hidden Units (quantitative)
## Range: [1, 10]
%>% extract_parameter_dials("penalty") mlp_param
## Amount of Regularization (quantitative)
## Transformer: log-10 [1e-100, Inf]
## Range (transformed scale): [-10, 0]
For
penalty
, the random numbers are uniform on the log (base 10) scale. The values in the grid are in their natural units.
%>% extract_parameter_dials("epochs") mlp_param
## # Epochs (quantitative)
## Range: [10, 1000]
13.1.1 Regular Grids
The dials
package contains a set of grid_*()
functions that take the parameter object and produce different types of grids.
grid_regular(mlp_param, levels = 2)
## # A tibble: 8 × 3
## hidden_units penalty epochs
## <int> <dbl> <int>
## 1 1 0.0000000001 10
## 2 10 0.0000000001 10
## 3 1 1 10
## 4 10 1 10
## 5 1 0.0000000001 1000
## 6 10 0.0000000001 1000
## 7 1 1 1000
## 8 10 1 1000
The levels
argument is the number of levels per parameter to create. It can also take a named vector of values:
%>%
mlp_param grid_regular(levels = c(hidden_units = 3,
penalty = 2,
epochs = 2))
## # A tibble: 12 × 3
## hidden_units penalty epochs
## <int> <dbl> <int>
## 1 1 0.0000000001 10
## 2 5 0.0000000001 10
## 3 10 0.0000000001 10
## 4 1 1 10
## 5 5 1 10
## 6 10 1 10
## 7 1 0.0000000001 1000
## 8 5 0.0000000001 1000
## 9 10 0.0000000001 1000
## 10 1 1 1000
## 11 5 1 1000
## 12 10 1 1000
Regular grids can be computationally expensive to use, especially when there are a large number of tuning parameters. This is true for many models but not all. There are some models whose tuning time decreases with a regular grid. More on this in a moment.
One advantage of a regular grid is that the relationships between the tuning parameters and the model metrics are easily understood. The full factorial nature of designs allows for examination of each parameter separately.
13.1.2 Irregular Grids
Even with small grids, random values can still result in overlaps. Also, the random grid needs to cover the whole parameter space.
Even for a sample of 15 candidate points, this plot shows some overlap between points for our example:
library(ggforce)
set.seed(200)
%>%
mlp_param # The 'original = FALSE' option keeps penalty in log10 units
grid_random(size = 15, original = FALSE) %>%
ggplot(aes(x = .panel_x, y = .panel_y)) +
geom_point() +
geom_blank() +
facet_matrix(vars(hidden_units, penalty, epochs), layer.diag = 2) +
labs(title = "Random design with 15 candidates")
ggsave(filename = "images/13_grid_random.png")
A much better approach is to use designs called
13.1.2.1 Space Filling Designs !!!
They generally find a configuration of points that cover the parameter space with the smallest chance of overlapping. The dials
package has:
Latin hypercube
Maximum entropy
As with grid_random()
, the primary inputs are the number of parameter combinations and a parameter object.
library(ggforce)
set.seed(200)
%>%
mlp_param grid_latin_hypercube(size = 15, original = FALSE) %>%
ggplot(aes(x = .panel_x, y = .panel_y)) +
geom_point() +
geom_blank() +
facet_matrix(vars(hidden_units, penalty, epochs), layer.diag = 2) +
labs(title = "Latin Hypercube design with 15 candidates")
ggsave("images/13_latin_hypercube.png")
The default design used by tune
: maximum entropy design.