Linesearch · GeometricOptimizers.jl

Linesearches for Optimizers

In GeometricOptimizers we typically build the search direction by multiplying the gradient with a Hessian. When starting at $x_k$ we take:

\[ p_k = H_{x_k}^{-1}(\nabla_{x_k}f),\]

where $[H_{x_k}]_{ij} = \partial^2{}f\partial{}x_i\partial{}x_j|_{x_k}$ is the Hessian. Note that we often use approximations of this Hessian in practice (such as the HessianBFGS).

The linesearch objective is then built as

\[f^\mathrm{ls}(\alpha) = f(x_k + \alpha{}p_k).\]

For manifolds [7] defining a Hessian, equivalently to defining a gradient, requires a Riemannian metric and the associated Levi-Civita connection $\nabla$:

\[\mathrm{Hess}(f) := \nabla\nabla{}f = \nabla{}df \in \Gamma(T^*\mathcal{M}\otimes{}T^*\mathcal{M}).\]

For specific vector fields $\xi, \eta \in \Gamma(T\mathcal{M})$ we can write this as:

\[\langle \mathrm{Hess}(f)[\xi], \eta \rangle = \xi(\eta{}f) - (\nabla_\xi\eta)f.\]

Example

We look at the following example:

f(x::Union{T, Vector{T}}) where {T<:Number} = exp.(x) .* (x .^ 3 .- 5x .+ 2x) .+ 2one(T)
f!(y::AbstractVector{T}, x::AbstractVector{T}) where {T} = y .= f.(x)
F!(y::AbstractVector{T}, x::AbstractVector{T}, params) where {T} = f!(y, x)

F! (generic function with 1 method)

We hence use linesearch_problem not for a SimpleSolvers.NewtonSolver, but for an EuclideanOptimizer:

x₀ = [0., .1, .2]
x = copy(x₀)
obj = OptimizerProblem(sum∘f, x₀)
grad = GradientAutodiff{Float64}(obj.F, length(x₀))
_cache = NewtonOptimizerCache(x₀)
state = NewtonOptimizerState(x₀)
hess = HessianAutodiff(obj, x₀)
update!(state, grad, x₀)
update!(_cache, state, grad, hess, x₀)
params = (x = state.x, )
ls_obj = linesearch_problem(obj, grad, _cache)

fˡˢ(alpha) = ls_obj.F(alpha, params)
∂fˡˢ∂α(alpha) = ls_obj.D(alpha, params)

Info

Note the different shape of the line search problem in the case of the optimizer, especially that the line search problem can take negative values in this case!

We now again want to find the minimum with quadratic line search and repeat the procedure above:

p₀ = fˡˢ(0.)

4.9464834626645615

p₁ = ∂fˡˢ∂α(0.)

6.452258296067367

params = (x = state.x, )
α₀ = bracket_minimum_with_fixed_point(ls_obj, params, 0.)[1]
y = fˡˢ(α₀)
p₂ = (y - p₀ - p₁*α₀) / α₀^2
p(α) = p₀ + p₁ * α + p₂ * α^2
α₁ = -p₁ / (2p₂)

-0.04213770599606998

We now again move the original $x$ in the Newton direction with step length $\alpha_1$:

sum∘f(x)

sum ∘ [2.0, 1.6695538954953815, 1.2769295671691796]

compute_new_iterate!(x, α₁, direction(_cache))

3-element Vector{Float64}:
 0.02106885299803499
 0.12442776394771286
 0.2283971496930037

sum∘f(x)

sum ∘ [1.935457175201534, 1.5794382691715403, 1.153970640387292]