Bierlaire Quadratic Line Search

In [2] quadratic line search is defined as an interpolation between three points. For this consider

using SimpleSolvers

f(x::T) where {T<:Number} = exp(x) * (x ^ 3 - 5x + 2x) + 2one(T)
f(x::AbstractArray{T}) where {T<:Number} = exp.(x) .* (x .^ 3 - 5x + 2x) .+ 2one(T)
f!(y::AbstractVector{T}, x::AbstractVector{T}) where {T} = y .= f.(x)
j!(j::AbstractMatrix{T}, x::AbstractVector{T}) where {T} = SimpleSolvers.ForwardDiff.jacobian!(j, f!, similar(x), x)
x = rand(1)
solver = NewtonSolver(x, f.(x); F = f)
update!(solver, x)
compute_jacobian!(solver, x, j!; mode = :function)

# compute rhs
f!(cache(solver).rhs, x)
rmul!(cache(solver).rhs, -1)

# multiply rhs with jacobian
factorize!(linearsolver(solver), jacobian(cache(solver)))
ldiv!(direction(cache(solver)), linearsolver(solver), cache(solver).rhs)

obj = MultivariateObjective(f, x)
ls_obj = linesearch_objective(obj, JacobianFunction(j!, x), cache(solver))
fˡˢ = ls_obj.F
∂fˡˢ∂α = ls_obj.D

In the figure above we already plotted three points $a$, $b$ and $c$ on whose basis a second-order polynomial will be built that should approximate $f^\mathrm{ls}$.[1] The polynomial is built with the ansatz:

\[p(\alpha) = \beta_1(\alpha - a)(x - b) + \beta_2(\alpha - a) + \beta_3(\alpha - b),\]

and by identifying

\[\begin{aligned} p(a) & = f^\mathrm{ls}(a), \\ p(b) & = f^\mathrm{ls}(b), \\ p(c) & = f^\mathrm{ls}(c), \\ \end{aligned}\]

we get

\[\begin{aligned} \beta_1 & = \frac{(b - c)f^\mathrm{ls}(a) + (c - a)f^\mathrm{ls}(b) + (a - b)f^\mathrm{ls}(c)}{(a - b)(c - a)(c - b)}, \\ \beta_2 & = \frac{f^\mathrm{ls}(b)}{b - a}, \\ \beta_3 & = \frac{f^\mathrm{ls}(a)}{a - b}. \end{aligned}\]

We can plot this polynomial:

We can now easily determine the minimum of the polynomial $p$. It is:

\[\chi = \frac{1}{2} \frac{ f^\mathrm{ls}(a) * (b^2 - c^2) + f^\mathrm{ls}(b) * (c^2 - a^2) + f^\mathrm{ls}(c) * (a^2 - b^2) }{f^\mathrm{ls}(a) * (b - c) + f^\mathrm{ls}(b) * (c - a) + f^\mathrm{ls}(c) * (a - b)}.\]

We now use this $\chi$ to either replace $a$, $b$ or $c$ and distinguish between the following four scenarios:

  1. $\chi > b$ and $f^\mathrm{ls}(\chi) > f^\mathrm{ls}(b)$ $\implies$ we replace $c \gets \chi$,
  2. $\chi > b$ and $f^\mathrm{ls}(\chi) \leq f^\mathrm{ls}(b)$ $\implies$ we replace $a, b \gets b, \chi$,
  3. $\chi \leq b$ and $f^\mathrm{ls}(\chi) > f^\mathrm{ls}(b)$ $\implies$ we replace $a \gets \chi$,
  4. $\chi \leq b$ and $f^\mathrm{ls}(\chi) \leq f^\mathrm{ls}(b)$ $\implies$ we replace $b, c \gets \chi, b$.

In our example we have the second case: $\chi$ is to the right of $b$ and $f^\mathrm{ls}(\chi)$ is less than $f(b)$. We therefore replace $a$ with $b$ and $b$ with $\chi$. The new approximation is the following one:

We again observe the second case. By again replacing $a, b \gets b, \chi$ we get:

No we observe the fourth case: $\chi$ is to the left of $b$ and $f^\mathrm{ls}(\chi)$ is below $f(b)$. Hence we replace $b, c \gets \chi, b.$ A successive iteration yields:

Info

After having computed $\chi$ we further either shift it to the left or right depending on whether $(c - b)$ or $(b - a)$ is bigger respectively. The shift is made by either adding or subtracting the constant $\varepsilon$.

Also see SimpleSolvers.DEFAULT_BIERLAIRE_ε.

  • 1These points further need to satisfy $f^\mathrm{ls}(a) > f^\mathrm{ls}(b) < f^\mathrm{ls}(c)$.