14.3 Simulated annealing

Simulated annealing is loosely related to annealing in metallurgy.

When hot, the atoms in the material are more free to move around, and, through random motion, tend to settle into better positions. A slow cooling brings the material to an ordered, crystalline state.

Page 128, Algorithms for Optimization, 2019.

If you were to cool rapidly, the atoms would stay wherever they were while the metal was hot and your blade or whatever would be ugly and brittle.

14.3.1 How it works

At a high level, simulated annealing is an iterative process. It involves:

starts with a single candidate value
takes a random but constrained walk (controlled random walk) in a parameter search space (local neighborhood)
if the new candidate parameter value is better than the current candidate value - i.e. leads to better performance - then the current value is replaced with this new parameter value
the algorithm can still accept worse candidate values sometimes; however it will do so to a lesser extent as:
- performance gets worse
- iterations increase
Why would it do this? Max and Julia sum it up perfectly:

“The acceptance probabilities of simulated annealing allows the search to proceed in the wrong direction, at least for the short term, with the potential to find a much better region of the parameter space in the long run.”

We can illustrate this graphically:

We can imagine the green color - the acceptance probability - is the temperature. At the beginning, it’s a real hot girl summer, we’re throwing it back everywhere, accepting poor solutions left and right. as the temperature cools, cuffing season starts, we are wayyyy more selective. This is how simulated annealing works - poor candidate parameter values have a higher chance of being accepted by the algorithm at the earlier iterations, and the algorithm hones in on the optimal candidate values as performance gets worse in the later iterations.

On a more serious note, you might be wondering how this probability is worked out. See Max and Julia for formal details. On a high level, it’s influenced by:

iteration number
performance
user-specified constant:
- coefficient that can be changed from the default value of 0.02 in finetune::control_sim_anneal()

From earlier, we said simulation annealing searches for values within a search space, called the local neighborhood. This “neighborhood” is defined by a radius that fluctuates randomly over a range and around the initial point. Once a candidate is chosen in that neighborhood, it becomes the new “initial point” and a new candidate is selected randomly in the radius range, and so on. The following graph illustrates this process using the penalty parameter of a glmnet model.

For models with non-numeric parameters, we can assign a probability for how often the parameter value changes.

One last note: simulation annealing keeps going until there is no best result within a pre-specified number of iterations. Max and Julia note that you should set a restart threshold so that the process can restart after it goes through a bad stretch.

14.3.2 The tune_sim_anneal() function

The tune_sim_anneal() function uses the generalized simulated annealing method of Bohachevsky, Johnson, and Stein (1986). There are more flavors in the literature, but this is the one that tidymodels uses. Important specifications include:

no_improve: the number of iterations before it stops if it finds no improved results
restart: number of iterations where allowing to be bad before starting anew
radius: radius range on (0, 1) that defines the search space i.e. local neighborhood
flip: for non-numeric parameters, this is the probability for how often the parameter value changes
cooling_coef: dictates how quickly the acceptance probability decreases as the we go through iterations. Larger coefficient values means the probability of accepting a bad result will decrease.

Implemention is very similar to grid search and Bayesian optimization. We can print out the best results, and have visual assessments of our search went across iterations.

ctrl_sa <- control_sim_anneal(verbose = TRUE, no_improve = 10L)

set.seed(1234)

svm_sa <-
  svm_wflow %>%
  tune_sim_anneal(
    resamples = penguins_folds,
    metrics = roc_res,
    initial = svm_initial,
    param_info = svm_param,
    iter = 50,
    control = ctrl_sa
  )

show_best(svm_sa)

## # A tibble: 5 × 9
##    cost rbf_sigma .metric .estimator  mean     n std_err .config .iter
##   <dbl>     <dbl> <chr>   <chr>      <dbl> <int>   <dbl> <chr>   <int>
## 1 19.5     0.0199 roc_auc binary     0.965     5 0.0101  Iter27     27
## 2 14.2     0.0276 roc_auc binary     0.964     5 0.0105  Iter30     30
## 3  5.91    0.0930 roc_auc binary     0.964     5 0.00958 Iter29     29
## 4 25.0     0.0786 roc_auc binary     0.964     5 0.00874 Iter31     31
## 5  4.14    0.0666 roc_auc binary     0.963     5 0.0110  Iter23     23

Voilààà!