Minimization from Scratch

Goal: Starting from a point (x1, x2) find minimum of the Rosenbrock function.

Approach: Use the function’s gradient.

Setup:

library(torch)

lr <- 0.01 # learning rate
num_interations <- 1000

x <- torch_tensor(c(-1, 1), requires_grad = TRUE)

x is the parameter with respect to which we want to compute the function’s derivative. Thus, we set requires_grad = TRUE. We have arbitrarily chosen x = (-1, 1) as a starting point of our search.

Next we perform the minimization. For each iteration we will:

  1. Compute the value of the rosenbrock function at the current value of x.

  2. Compute the gradient at x (i.e. direction of steepest ascent).

  3. Take a step of size lr in the (negative) direction of the gradient.

  4. Repeat.

A few things to point out about the code below:

  • We use the with_no_grad() function. Reason: Because we set requires_grad = TRUE in the definition of x, torch will include all operations on x (including this one) in the derivative calculation, which we don’t want.
  • Recall from Chapter 3 that x$sub_() (with an underscore) will modify the value of x. Similarly, x$grad$zero_() will also modify x.
  • We use x$grad$zero_() to zero out the grad field of x. By default, torch accumulates gradients.
for(i in 1:num_interations){
  if(i %% 200 == 0) cat("Iteration: ", i, "\n")
  
  # Compute value of function:
  value <- rosenbrock(x)
  if(i %% 200 == 0) cat("Value is: ", as.numeric(value), "\n")
  
  # Compute the gradient 
  value$backward()
  if(i %% 200 == 0) cat("Gradient is: ", as.matrix(x$grad), "\n\n")
  
  with_no_grad({
   x$sub_(lr * x$grad) # Take a step of size lr in the (negative) direction of the gradient
   x$grad$zero_() # Zero out grad field of x.
  })
}
## Iteration:  200 
## Value is:  0.07398106 
## Gradient is:  -0.1603189 -0.2532476 
## 
## Iteration:  400 
## Value is:  0.009619333 
## Gradient is:  -0.04347242 -0.08254051 
## 
## Iteration:  600 
## Value is:  0.001719962 
## Gradient is:  -0.01683905 -0.03373682 
## 
## Iteration:  800 
## Value is:  0.0003393509 
## Gradient is:  -0.007221781 -0.01477957 
## 
## Iteration:  1000 
## Value is:  6.962555e-05 
## Gradient is:  -0.003222887 -0.006653666

Let’s check the value of x:

x
## torch_tensor
##  0.9918
##  0.9830
## [ CPUFloatType{2} ][ requires_grad = TRUE ]

It’s close to (1,1) (the true minimum)!

Exercise: What kind of difference does the learning rate make? Try lr=0.001 and lr=0.1, respectively.