Quadratic Line Search
Quadratic line search is based on making a quadratic approximation to an objective and then pick the minimum of this quadratic approximation as the next iteration of $\alpha$.
The quadratic polynomial is built the following way[1]:
\[p(\alpha) = f^\mathrm{ls}(0) + (f^\mathrm{ls})'(0)\alpha + p_2\alpha^2,\]
and we also call $p_0:=f^\mathrm{ls}(0)$ and $p_1:=(f^\mathrm{ls})'(0)$. The coefficient $p_2$ is then determined the following way:
- take a value $\alpha$ (typically initialized as
SimpleSolvers.DEFAULT_ARMIJO_α₀
) and compute $y = f^\mathrm{ls}(\alpha)$, - set $p_2 \gets \frac{(y^2 - p_0 - p_1\alpha)}{\alpha^2}.$
After the polynomial is found we then take its minimum (analogously to the Bierlaire quadratic line search) and check if it satisfies the sufficient decrease condition. If it does not satisfy this condition we repeat the process, but with the current $\alph$ as the starting point for the line search (instead of the initial SimpleSolvers.DEFAULT_ARMIJO_α₀
).
Example
Here we treat the following problem:
f(x::Union{T, Vector{T}}) where {T<:Number} = exp.(x) .* (x .^ 3 - 5x + 2x) .+ 2one(T)
f!(y::AbstractVector{T}, x::AbstractVector{T}) where {T} = y .= f.(x)
We now want to use quadratic line search to find the root of this function starting at $x = 0$. We compute the Jacobian of $f$ and initialize a line search objective:
using SimpleSolvers
j!(j::AbstractMatrix{T}, x::AbstractVector{T}) where {T} = SimpleSolvers.ForwardDiff.jacobian!(j, f, x)
x = [0.]
# allocate solver
solver = NewtonSolver(x, f(x); F = f)
# initialize solver
update!(solver, x)
compute_jacobian!(solver, x, j!; mode = :function)
# compute rhs
f!(cache(solver).rhs, x)
rmul!(cache(solver).rhs, -1)
# multiply rhs with jacobian
factorize!(linearsolver(solver), jacobian(cache(solver)))
ldiv!(direction(cache(solver)), linearsolver(solver), cache(solver).rhs)
obj = MultivariateObjective(f, x)
ls_obj = linesearch_objective(obj, JacobianFunction(j!, x), cache(solver))
fˡˢ = ls_obj.F
∂fˡˢ∂α = ls_obj.D
The first two coefficient of the polynomial $p$ (i.e. $p_1$ and $p_2$) are easy to compute:
p₀ = fˡˢ(0.)
p₁ = ∂fˡˢ∂α(0.)
Initializing $\alpha$
In order to compute $p_2$ we first have to initialize $\alpha$. We start by guessing an initial $\alpha$ as SimpleSolvers.DEFAULT_ARMIJO_α₀
. If this initial alpha does not satisfy the SimpleSolvers.BracketMinimumCriterion
, i.e. it holds that $f^\mathrm{ls}(\alpha_0) > f^\mathrm{ls}(0)$, we call SimpleSolvers.bracket_minimum_with_fixed_point
(similarly to calling SimpleSolvers.bracket_minimum
for standard bracketing).
Looking at SimpleSolvers.DEFAULT_ARMIJO_α₀
, we see that the SimpleSolvers.BracketMinimumCriterion
is not satisfied:
We therefore see that calling SimpleSolvers.determine_initial_α
returns a different $\alpha$ (the result of calling SimpleSolvers.bracket_minimum_with_fixed_point
):
α₀ = determine_initial_α(ls_obj, SimpleSolvers.DEFAULT_ARMIJO_α₀)
1.28
We can now finally compute $p_2$ and determine the minimum of the polynomial:
y = fˡˢ(α₀)
p₂ = (y - p₀ - p₁*α₀) / α₀^2
p(α) = p₀ + p₁ * α + p₂ * α^2
αₜ = -p₁ / (2p₂)
0.5141386947276962
When using SimpleSolvers.QuadraticState
we in addition call SimpleSolvers.adjust_α
:
α₁ = adjust_α(αₜ, α₀)
0.5141386947276962
We now check wether $\alpha_1$ satisfies the sufficient decrease condition:
sdc = SufficientDecreaseCondition(DEFAULT_WOLFE_c₁, 0., fˡˢ(0.), derivative(ls_obj, 0.), 1., ls_obj)
sdc(α₁)
true
We now move the original $x$ in the Newton direction with step length $\alpha_1$ by using SimpleSolvers.compute_new_iterate
:
x .= compute_new_iterate(x, α₁, direction(cache(solver)))
1-element Vector{Float64}:
0.3427591298184641
And we see that we already very close to the root.
Example for Optimization
We look again at the same example as before, but this time we want to find a minimum and not a root. We hence use SimpleSolvers.linesearch_objective
not for a NewtonSolver
, but for an Optimizer
:
using SimpleSolvers: NewtonOptimizerCache, initialize!, gradient
x₀, x₁ = [0.], x
obj = MultivariateObjective(sum∘f, x₀)
gradient!(obj, x₀)
value!(obj, x₀)
_cache = NewtonOptimizerCache(x₀)
hess = Hessian(obj, x₀; mode = :autodiff)
update!(hess, x₀)
update!(_cache, x₀, gradient(obj), hess)
gradient!(obj, x₁)
value!(obj, x₁)
update!(hess, x₁)
update!(_cache, x₁, gradient(obj), hess)
ls_obj = linesearch_objective(obj, _cache)
fˡˢ = ls_obj.F
∂fˡˢ∂α = ls_obj.D
Note the different shape of the line search objective in the case of the optimizer, especially that the line search objective can take negative values in this case!
We now again want to find the minimum with quadratic line search and repeat the procedure above:
p₀ = fˡˢ(0.)
0.6080642672555336
p₁ = ∂fˡˢ∂α(0.)
4.405693326634148
α₀ = determine_initial_α(ls_obj, SimpleSolvers.DEFAULT_ARMIJO_α₀)
y = fˡˢ(α₀)
p₂ = (y - p₀ - p₁*α₀) / α₀^2
p(α) = p₀ + p₁ * α + p₂ * α^2
αₜ = -p₁ / (2p₂)
α₁ = adjust_α(αₜ, α₀)
-1.275
What we see here is that we do not use $\alpha_t = -p_1 / (2p_2)$ as SimpleSolvers.adjust_α
instead picks the left point in the interval $[\sigma_0\alpha_0, \sigma_1\alpha_0]$ as the change computed with $\alpha_t$ would be too small.
We now again move the original $x$ in the Newton direction with step length $\alpha_1$:
x .= compute_new_iterate(x, α₁, direction(_cache))
1-element Vector{Float64}:
1.4394773498026217
We make another iteration:
gradient!(obj, x)
value!(obj, x)
update!(hess, x)
update!(_cache, x, gradient(obj), hess)
ls_obj = linesearch_objective(obj, _cache)
fˡˢ = ls_obj.F
∂fˡˢ∂α = ls_obj.D
p₀ = fˡˢ(0.)
p₁ = ∂fˡˢ∂α(0.)
α₀⁽²⁾ = determine_initial_α(ls_obj, SimpleSolvers.DEFAULT_ARMIJO_α₀)
y = fˡˢ(α₀)
p₂ = (y - p₀ - p₁*α₀⁽²⁾) / α₀⁽²⁾^2
p(α) = p₀ + p₁ * α + p₂ * α^2
αₜ = -p₁ / (2p₂)
1.0682684767171837
α₂ = adjust_α(αₜ, α₀⁽²⁾)
1.0682684767171837
We see that for $\alpha_2$ (as opposed to $\alpha_1$) we have $\alpha_2 = \alpha_t$ as $\alpha_t$ is in (this is what SimpleSolvers.adjust_α
checks for):
(DEFAULT_ARMIJO_σ₀ * α₀⁽²⁾, DEFAULT_ARMIJO_σ₁ * α₀⁽²⁾)
(0.512, 2.56)
x .= compute_new_iterate(x, α₂, direction(_cache))
1-element Vector{Float64}:
1.2931972894162747
We finally compute a third iterate:
gradient!(obj, x)
value!(obj, x)
update!(hess, x)
update!(_cache, x, gradient(obj), hess)
ls_obj = linesearch_objective(obj, _cache)
fˡˢ = ls_obj.F
∂fˡˢ∂α = ls_obj.D
p₀ = fˡˢ(0.)
p₁ = ∂fˡˢ∂α(0.)
α₀⁽³⁾ = determine_initial_α(ls_obj, SimpleSolvers.DEFAULT_ARMIJO_α₀)
y = fˡˢ(α₀)
p₂ = (y - p₀ - p₁*α₀⁽³⁾) / α₀^2
p(α) = p₀ + p₁ * α + p₂ * α^2
αₜ = -p₁ / (2p₂)
α₃ = adjust_α(αₜ, α₀⁽³⁾)
0.3759474782302077
x .= compute_new_iterate(x, α₃, direction(_cache))
1-element Vector{Float64}:
1.281997845228676
- 1This is different from the Bierlaire quadratic polynomial described in [2].