Minimization from Scratch
Goal: Starting from a point (x1, x2)
find minimum of the Rosenbrock function.
Approach: Use the function’s gradient.
Setup:
library(torch)
lr <- 0.01 # learning rate
num_interations <- 1000
x <- torch_tensor(c(-1, 1), requires_grad = TRUE)
x
is the parameter with respect to which we want to compute the function’s derivative. Thus, we set requires_grad = TRUE
. We have arbitrarily chosen x = (-1, 1)
as a starting point of our search.
Next we perform the minimization. For each iteration we will:
Compute the value of the
rosenbrock
function at the current value ofx
.Compute the gradient at
x
(i.e. direction of steepest ascent).Take a step of size
lr
in the (negative) direction of the gradient.Repeat.
A few things to point out about the code below:
- We use the
with_no_grad()
function. Reason: Because we setrequires_grad = TRUE
in the definition ofx
, torch will include all operations onx
(including this one) in the derivative calculation, which we don’t want. - Recall from Chapter 3 that
x$sub_()
(with an underscore) will modify the value ofx
. Similarly,x$grad$zero_()
will also modifyx
. - We use
x$grad$zero_()
to zero out thegrad
field ofx
. By default, torch accumulates gradients.
for(i in 1:num_interations){
if(i %% 200 == 0) cat("Iteration: ", i, "\n")
# Compute value of function:
value <- rosenbrock(x)
if(i %% 200 == 0) cat("Value is: ", as.numeric(value), "\n")
# Compute the gradient
value$backward()
if(i %% 200 == 0) cat("Gradient is: ", as.matrix(x$grad), "\n\n")
with_no_grad({
x$sub_(lr * x$grad) # Take a step of size lr in the (negative) direction of the gradient
x$grad$zero_() # Zero out grad field of x.
})
}
## Iteration: 200
## Value is: 0.07398106
## Gradient is: -0.1603189 -0.2532476
##
## Iteration: 400
## Value is: 0.009619333
## Gradient is: -0.04347242 -0.08254051
##
## Iteration: 600
## Value is: 0.001719962
## Gradient is: -0.01683905 -0.03373682
##
## Iteration: 800
## Value is: 0.0003393509
## Gradient is: -0.007221781 -0.01477957
##
## Iteration: 1000
## Value is: 6.962555e-05
## Gradient is: -0.003222887 -0.006653666
Let’s check the value of x
:
## torch_tensor
## 0.9918
## 0.9830
## [ CPUFloatType{2} ][ requires_grad = TRUE ]
It’s close to (1,1) (the true minimum)!
Exercise: What kind of difference does the learning rate make? Try
lr=0.001
andlr=0.1
, respectively.