Quadratic Line Search

Quadratic line search is based on making a quadratic approximation to an objective and then pick the minimum of this quadratic approximation as the next iteration of $\alpha$.

The quadratic polynomial is built the following way^[1]:

\[p(\alpha) = f^\mathrm{ls}(0) + (f^\mathrm{ls})'(0)\alpha + p_2\alpha^2,\]

and we also call $p_0:=f^\mathrm{ls}(0)$ and $p_1:=(f^\mathrm{ls})'(0)$. The coefficient $p_2$ is then determined the following way:

take a value $\alpha$ (typically initialized as SimpleSolvers.DEFAULT_ARMIJO_α₀) and compute $y = f^\mathrm{ls}(\alpha)$,
set $p_2 \gets \frac{(y^2 - p_0 - p_1\alpha)}{\alpha^2}.$

After the polynomial is found we then take its minimum (analogously to the Bierlaire quadratic line search) and check if it satisfies the sufficient decrease condition. If it does not satisfy this condition we repeat the process, but with the current $\alph$ as the starting point for the line search (instead of the initial SimpleSolvers.DEFAULT_ARMIJO_α₀).

Example

Here we treat the following problem:

f(x::Union{T, Vector{T}}) where {T<:Number} = exp.(x) .* (x .^ 3 - 5x + 2x) .+ 2one(T)
f!(y::AbstractVector{T}, x::AbstractVector{T}) where {T} = y .= f.(x)
F!(y::AbstractVector{T}, x::AbstractVector{T}, params) where {T} = f!(y, x)

We now want to use quadratic line search to find the root of this function starting at $x = 0$. We compute the Jacobian of $f$ and initialize a line search objective:

using SimpleSolvers

J!(j::AbstractMatrix{T}, x::AbstractVector{T}, params) where {T} = SimpleSolvers.ForwardDiff.jacobian!(j, f, x)
x = [0.]
# allocate solver
solver = NewtonSolver(x, f(x); F = F!)
# initialize solver
params = nothing
update!(solver, x, params)
compute_jacobian!(solver, x, J!, params; mode = :function)

# compute rhs
F!(cache(solver).rhs, x, params)
rmul!(cache(solver).rhs, -1)

# multiply rhs with jacobian
factorize!(linearsolver(solver), jacobian(solver))
ldiv!(direction(cache(solver)), linearsolver(solver), cache(solver).rhs)
nls = NonlinearSystem(F!, x, f(x))
ls_obj = linesearch_objective(nls, cache(solver), params)
fˡˢ = ls_obj.F
∂fˡˢ∂α = ls_obj.D

┌ Warning: This function should not be called! Instead call `compute_jacobian!(s, x, params)`.
└ @ SimpleSolvers ~/work/SimpleSolvers.jl/SimpleSolvers.jl/src/nonlinear/newton_solver.jl:187

The first two coefficient of the polynomial $p$ (i.e. $p_1$ and $p_2$) are easy to compute:

p₀ = fˡˢ(0.)
p₁ = ∂fˡˢ∂α(0.)

Initializing $\alpha$

In order to compute $p_2$ we first have to initialize $\alpha$. We start by guessing an initial $\alpha$ as SimpleSolvers.DEFAULT_ARMIJO_α₀. If this initial alpha does not satisfy the SimpleSolvers.BracketMinimumCriterion, i.e. it holds that $f^\mathrm{ls}(\alpha_0) > f^\mathrm{ls}(0)$, we call SimpleSolvers.bracket_minimum_with_fixed_point (similarly to calling SimpleSolvers.bracket_minimum for standard bracketing).

Looking at SimpleSolvers.DEFAULT_ARMIJO_α₀, we see that the SimpleSolvers.BracketMinimumCriterion is not satisfied:

We therefore see that calling SimpleSolvers.determine_initial_α returns a different $\alpha$ (the result of calling SimpleSolvers.bracket_minimum_with_fixed_point):

α₀ = determine_initial_α(ls_obj, SimpleSolvers.DEFAULT_ARMIJO_α₀)

1.28

We can now finally compute $p_2$ and determine the minimum of the polynomial:

y = fˡˢ(α₀)
p₂ = (y - p₀ - p₁*α₀) / α₀^2
p(α) = p₀ + p₁ * α + p₂ * α^2
αₜ = -p₁ / (2p₂)

0.5141386947276962

When using SimpleSolvers.QuadraticState we in addition call SimpleSolvers.adjust_α:

α₁ = adjust_α(αₜ, α₀)

0.5141386947276962

We now check wether $\alpha_1$ satisfies the sufficient decrease condition:

sdc = SufficientDecreaseCondition(DEFAULT_WOLFE_c₁, 0., fˡˢ(0.), derivative(ls_obj, 0.), 1., ls_obj)
sdc(α₁)

true

We now move the original $x$ in the Newton direction with step length $\alpha_1$ by using SimpleSolvers.compute_new_iterate:

x .= compute_new_iterate(x, α₁, direction(cache(solver)))

1-element Vector{Float64}:
 0.3427591298184641

And we see that we already very close to the root.

Example for Optimization

We look again at the same example as before, but this time we want to find a minimum and not a root. We hence use SimpleSolvers.linesearch_objective not for a NewtonSolver, but for an Optimizer:

using SimpleSolvers: NewtonOptimizerCache, initialize!, gradient

x₀, x₁ = [0.], x
obj = MultivariateObjective(sum∘f, x₀)
gradient!(obj, x₀)
value!(obj, x₀)
_cache = NewtonOptimizerCache(x₀)
hess = Hessian(obj, x₀; mode = :autodiff)
update!(hess, x₀)
update!(_cache, x₀, gradient(obj), hess)
gradient!(obj, x₁)
value!(obj, x₁)
update!(hess, x₁)
update!(_cache, x₁, gradient(obj), hess)
ls_obj = linesearch_objective(obj, _cache)

fˡˢ = ls_obj.F
∂fˡˢ∂α = ls_obj.D

Info

Note the different shape of the line search objective in the case of the optimizer, especially that the line search objective can take negative values in this case!

We now again want to find the minimum with quadratic line search and repeat the procedure above:

p₀ = fˡˢ(0.)

0.6080642672555336

p₁ = ∂fˡˢ∂α(0.)

4.405693326634148

α₀ = determine_initial_α(ls_obj, SimpleSolvers.DEFAULT_ARMIJO_α₀)
y = fˡˢ(α₀)
p₂ = (y - p₀ - p₁*α₀) / α₀^2
p(α) = p₀ + p₁ * α + p₂ * α^2
αₜ = -p₁ / (2p₂)
α₁ = adjust_α(αₜ, α₀)

-1.275

What we see here is that we do not use $\alpha_t = -p_1 / (2p_2)$ as SimpleSolvers.adjust_α instead picks the left point in the interval $[\sigma_0\alpha_0, \sigma_1\alpha_0]$ as the change computed with $\alpha_t$ would be too small.

We now again move the original $x$ in the Newton direction with step length $\alpha_1$:

x .= compute_new_iterate(x, α₁, direction(_cache))

1-element Vector{Float64}:
 1.4394773498026217

We make another iteration:

gradient!(obj, x)
value!(obj, x)
update!(hess, x)
update!(_cache, x, gradient(obj), hess)
ls_obj = linesearch_objective(obj, _cache)

fˡˢ = ls_obj.F
∂fˡˢ∂α = ls_obj.D
p₀ = fˡˢ(0.)
p₁ = ∂fˡˢ∂α(0.)
α₀⁽²⁾ = determine_initial_α(ls_obj, SimpleSolvers.DEFAULT_ARMIJO_α₀)
y = fˡˢ(α₀)
p₂ = (y - p₀ - p₁*α₀⁽²⁾) / α₀⁽²⁾^2
p(α) = p₀ + p₁ * α + p₂ * α^2
αₜ = -p₁ / (2p₂)

1.0682684767171837

α₂ = adjust_α(αₜ, α₀⁽²⁾)

1.0682684767171837

We see that for $\alpha_2$ (as opposed to $\alpha_1$) we have $\alpha_2 = \alpha_t$ as $\alpha_t$ is in (this is what SimpleSolvers.adjust_α checks for):

(DEFAULT_ARMIJO_σ₀ * α₀⁽²⁾, DEFAULT_ARMIJO_σ₁ * α₀⁽²⁾)

(0.512, 2.56)

x .= compute_new_iterate(x, α₂, direction(_cache))

1-element Vector{Float64}:
 1.2931972894162747

We finally compute a third iterate:

gradient!(obj, x)
value!(obj, x)
update!(hess, x)
update!(_cache, x, gradient(obj), hess)
ls_obj = linesearch_objective(obj, _cache)

fˡˢ = ls_obj.F
∂fˡˢ∂α = ls_obj.D
p₀ = fˡˢ(0.)
p₁ = ∂fˡˢ∂α(0.)
α₀⁽³⁾ = determine_initial_α(ls_obj, SimpleSolvers.DEFAULT_ARMIJO_α₀)
y = fˡˢ(α₀)
p₂ = (y - p₀ - p₁*α₀⁽³⁾) / α₀^2
p(α) = p₀ + p₁ * α + p₂ * α^2
αₜ = -p₁ / (2p₂)
α₃ = adjust_α(αₜ, α₀⁽³⁾)

0.3759474782302077

x .= compute_new_iterate(x, α₃, direction(_cache))

1-element Vector{Float64}:
 1.281997845228676

Example II

Here we consider the same example as when discussing the Bierlaire quadratic line search.

ls_obj = linesearch_objective(nls, cache(solver), params)
fˡˢ = ls_obj.F
∂fˡˢ∂α = ls_obj.D

We now try to find a minimum of $f^\mathrm{ls}$ with quadratic line search. For this we first need to find a bracket; we again do this with SimpleSolvers.bracket_minimum_with_fixed_point^[2]:

(a, b) = SimpleSolvers.bracket_minimum_with_fixed_point(fˡˢ, 0.)

(0.0, 5.12)

We plot the bracket:

using CairoMakie
mred = RGBf(214 / 256, 39 / 256, 40 / 256)
mpurple = RGBf(148 / 256, 103 / 256, 189 / 256)
mgreen = RGBf(44 / 256, 160 / 256, 44 / 256)
mblue = RGBf(31 / 256, 119 / 256, 180 / 256)
morange = RGBf(255 / 256, 127 / 256, 14 / 256)

fig = Figure()
ax = Axis(fig[1, 1])
alpha = -2.5:.01:3.
lines!(ax, alpha, fˡˢ.(alpha); label = L"f^\mathrm{ls}(\alpha)")
scatter!(ax, a, fˡˢ(a); color = mred, label = L"a")
scatter!(ax, b, fˡˢ(b); color = mpurple, label = L"b")
ylims!(ax, (-1., 6.))
axislegend(ax)
save("f_ls_1.png", fig)

We now build the polynomial:

p₀ = fˡˢ(a)
p₁ = ∂fˡˢ∂α(a)
y = fˡˢ(b)
p₂ = (y - p₀ - p₁*b) / b^2
p(α) = p₀ + p₁ * α + p₂ * α^2

and compute its minimum:

αₜ = -p₁ / (2p₂)

2.446514989327422

lines!(ax, alpha, p.(alpha); label = L"p(\alpha)")
scatter!(ax, αₜ, p(αₜ); label = L"\alpha_t")
ylims!(ax, (-1., 6.))
axislegend(ax)
save("f_ls_2.png", fig)

We now set $a \gets \alpha_t$ and perform another iteration:

(a, b) = SimpleSolvers.bracket_minimum_with_fixed_point(fˡˢ, αₜ)

(-0.1034850106725782, 2.4565149893274216)

We again build the polynomial:

p₀ = fˡˢ(a)
p₁ = ∂fˡˢ∂α(a)
y = fˡˢ(b)
p₂ = (y - p₀ - p₁*(b-a)) / (b-a)^2
p(α) = p₀ + p₁ * (α-a) + p₂ * (α-a)^2

and compute its minimum:

αₜ = -p₁ / (2p₂) + a

1.285313587994457

1This is different from the Bierlaire quadratic polynomial described in [2].
2Here we use SimpleSolvers.bracket_minimum_with_fixed_point directly instead of using SimpleSolvers.determine_initial_α.