GeometricOptimizers

Documentation for GeometricOptimizers.

GeometricOptimizers.AdamType
Adam(η, ρ₁, ρ₂, δ)

Make an instance of the Adam Optimizer.

Here the cache consists of first and second moments that are updated as

\[B_1 \gets ((\rho_1 - \rho_1^t)/(1 - \rho_1^t))\cdot{}B_1 + (1 - \rho_1)/(1 - \rho_1^t)\cdot{}\nabla{}L,\]

and

\[B_2 \gets ((\rho_2 - \rho_1^t)/(1 - \rho_2^t))\cdot{}B_2 + (1 - \rho_2)/(1 - \rho_2^t)\cdot\nabla{}L\odot\nabla{}L.\]

The final velocity is computed as:

\[\mathrm{velocity} \gets -\eta{}B_1/\sqrt{B_2 + \delta}.\]

Implementation

The velocity is stored in the input to save memory:

mul!(B, -o.method.η, /ᵉˡᵉ(C.B₁, scalar_add(racᵉˡᵉ(C.B₂), o.method.δ)))

where B is the input to the [update!] function.

The algorithm and suggested defaults are taken from [1, page 301].

source
GeometricOptimizers.AdamCacheType
AdamCache(Y)

Store the first and second moment for Y (initialized as zeros).

First and second moments are called B₁ and B₂.

If the cache is called with an instance of a homogeneous space, e.g. the StiefelManifold $St(n,N)$ it initializes the moments as elements of $\mathfrak{g}^\mathrm{hor}$ (StiefelLieAlgHorMatrix).

Examples

using GeometricOptimizers

Y = rand(StiefelManifold, 5, 3)
GeometricOptimizers.AdamCache(Y).B₁

# output

5×5 StiefelLieAlgHorMatrix{Float64, SkewSymMatrix{Float64, Vector{Float64}}, Matrix{Float64}}:
 0.0  -0.0  -0.0  -0.0  -0.0
 0.0   0.0  -0.0  -0.0  -0.0
 0.0   0.0   0.0  -0.0  -0.0
 0.0   0.0   0.0   0.0   0.0
 0.0   0.0   0.0   0.0   0.0
source
GeometricOptimizers.AdamWithDecayType
AdamWithDecay(n_epochs, η₁=1f-2, η₂=1f-6, ρ₁=9f-1, ρ₂=9.9f-1, δ=1f-8)

Make an instance of the Adam Optimizer with weight decay.

All except the first argument (the number of epochs) have defaults.

The difference to the standard Adam is that we change the learning reate $\eta$ in each step. Apart from the time dependency of $\eta$ the two algorithms are however equivalent. $\eta(0)$ starts with a high value $\eta_1$ and then exponentially decrease until it reaches $\eta_2$ with

\[ \eta(t) = \gamma^t\eta_1,\]

where $\gamma = \exp(\log(\eta_1 / \eta_2) / \mathtt{n\_epochs}).$

source
GeometricOptimizers.BFGSType
BFGS(η, δ)

Make an instance of the Broyden-Fletcher-Goldfarb-Shanno (BFGS) optimizer.

η is the learning rate. δ is a stabilization parameter.

source
GeometricOptimizers.EuclideanOptimizerType
EuclideanOptimizer

The optimizer that stores all the information needed for an optimization problem.

This problem can be solved by calling solve!(::AbstractVector, ::Optimizer).

Keys

Examples

F(x) = sum(sin.(x) .^ 2)
x = ones(3)
algorithm = Newton()
state = OptimizerState(algorithm, x)
optimizer = EuclideanOptimizer(x, F; algorithm = algorithm, linesearch = Bisection())

solve!(x, state, optimizer)
x

# output

3-element Vector{Float64}:
 1.1102230246251565e-16
 1.1102230246251565e-16
 1.1102230246251565e-16

We note that this same problem may have trouble converging with other line searches:

x = ones(3)
algorithm = Newton()
state = OptimizerState(algorithm, x)
optimizer = EuclideanOptimizer(x, F; algorithm = algorithm, linesearch = Backtracking())

solve!(x, state, optimizer)
x

# output

3-element Vector{Float64}:
 1.0
 1.0
 1.0
source
GeometricOptimizers.GlobalSectionType
GlobalSection(Y)

Construct a global section for Y.

A global section $\lambda$ is a mapping from a homogeneous space $\mathcal{M}$ to the corresponding Lie group $G$ such that

\[\lambda(Y)E = Y,\]

Also see apply_section and global_rep.

Implementation

For an implementation of GlobalSection for a custom array (especially manifolds), the function global_section has to be generalized.

source
GeometricOptimizers.GrassmannLieAlgHorMatrixType
GrassmannLieAlgHorMatrix(B::AbstractMatrix, N::Integer, n::Integer)

Build an instance of GrassmannLieAlgHorMatrix based on an arbitrary matrix B of size $(N-n)\times{}n$.

GrassmannLieAlgHorMatrix is the horizontal component of the Lie algebra of skew-symmetric matrices (with respect to the canonical metric).

Extended help

The projection here is: $\pi:S \to SE/\sim$ where

\[E = \begin{bmatrix} \mathbb{I}_{n} \\ \mathbb{O}_{(N-n)\times{}n} \end{bmatrix},\]

and the equivalence relation is

\[V_1 \sim V_2 \iff \exists A\in\mathcal{S}_\mathrm{skew}(n) \text{ such that } V_2 = V_1 + \begin{bmatrix} A \\ \mathbb{O} \end{bmatrix}\]

An element of GrassmannLieAlgMatrix takes the form:

\[\begin{pmatrix} \bar{\mathbb{O}} & B^T \\ B & \mathbb{O} \end{pmatrix},\]

where $\bar{\mathbb{O}}\in\mathbb{R}^{n\times{}n}$ and $\mathbb{O}\in\mathbb{R}^{(N - n)\times(N-n)}.$

source
GeometricOptimizers.GrassmannLieAlgHorMatrixMethod
GrassmannLieAlgHorMatrix(D::AbstractMatrix, n::Integer)

Take a big matrix as input and build an instance of GrassmannLieAlgHorMatrix.

The integer $N$ in $Gr(n, N)$ here is the number of rows of D.

Extended help

If the constructor is called with a big $N\times{}N$ matrix, then the projection is performed the following way:

\[\begin{pmatrix} A & B_1 \\ B_2 & D \end{pmatrix} \mapsto \begin{pmatrix} \bar{\mathbb{O}} & -B_2^T \\ B_2 & \mathbb{O} \end{pmatrix}.\]

This can also be seen as the operation:

\[D \mapsto \Omega(E, DE - EE^TDE),\]

where $\Omega$ is the horizontal lift GeometricOptimizers.Ω.

source
GeometricOptimizers.LowerTriangularType
LowerTriangular(S::AbstractVector, n::Int)

Build a lower-triangular matrix from a vector.

A lower-triangular matrix is an $n\times{}n$ matrix that has zeros on the diagonal and on the upper triangular.

The data are stored in a vector $S$ similarly to other matrices. See UpperTriangular, SkewSymMatrix and SymmetricMatrix.

The struct two fields: S and n. The first stores all the entries of the matrix in a sparse fashion (in a vector) and the second is the dimension $n$ for $A\in\mathbb{R}^{n\times{}n}$.

Examples

using GeometricOptimizers
S = [1, 2, 3, 4, 5, 6]
LowerTriangular(S, 4)

# output

4×4 LowerTriangular{Int64, Vector{Int64}}:
 0  0  0  0
 1  0  0  0
 2  3  0  0
 4  5  6  0
source
GeometricOptimizers.LowerTriangularMethod
LowerTriangular(A::AbstractMatrix)

Build a lower-triangular matrix from a matrix.

This is done by taking the lower left of that matrix.

Examples

using GeometricOptimizers
M = [1 2 3 4; 5 6 7 8; 9 10 11 12; 13 14 15 16]
LowerTriangular(M)

# output

4×4 LowerTriangular{Int64, Vector{Int64}}:
  0   0   0  0
  5   0   0  0
  9  10   0  0
 13  14  15  0
source
GeometricOptimizers.MomentumType
Momentum(η, α)

Make an instance of the momentum optimizer.

The momentum optimizer is similar to the SimpleSolvers.Gradient. It however has a nontrivial cache that stores past history (see MomentumCache). The cache is updated via:

\[ B^{\mathrm{cache}} \gets \alpha{}B^{\mathrm{cache}} + \nabla_\mathrm{weights}L\]

and then the final velocity is computed as

\[ \mathrm{velocity} \gets - \eta{}B^{\mathrm{cache}}.\]

Implementation

To save memory the velocity is stored in the input $\nabla_WL$. This is similar to the case of the SimpleSolvers.Gradient.

source
GeometricOptimizers.NewtonOptimizerCacheType
NewtonOptimizerCache <: OptimizerCache

Keys

  • x: current iterate (this stores the guess called by the functions generated with linesearch_problem),
  • Δx: direction of optimization step (difference between x and ); this is obtained by multiplying rhs with the inverse of the Hessian,
  • g: gradient value (this stores the gradient associated with x called by the derivative part of linesearch_problem),
  • Δg: gradient difference (difference between g and ); this is used for computing the OptimizerStatus,
  • rhs: the right hand side used to compute the update,
  • H: the Hessian matrix evaluated at x,

Also compare this to SimpleSolvers.NonlinearSolverCache.

source
GeometricOptimizers.OptimizerType
Optimizer(method, cache, step, retraction)

Store the method (e.g. Adam with corresponding hyperparameters), the cache (e.g. AdamCache), the optimization step and the retraction.

It takes as input an optimization method and the parameters of a network.

Before one can call Optimizer a OptimizerMethod that stores all the hyperparameters of the optimizer needs to be specified.

source
GeometricOptimizers.OptimizerMethod
Optimizer(method, nn_params)

Allocate the cache for a specific method and nn_params for an instance of Optimizer.

Internally this calls init_optimizer_cache.

An equivalent constructor is

Optimizer(method, nn::NeuralNetwork)

Arguments

The optional keyword argument is the retraction. By default this is cayley.

source
GeometricOptimizers.OptimizerMethodType
OptimizerMethod

Each Optimizer has to be called with an OptimizerMethod. This specifies how the neural network weights are updated in each optimization step.

source
GeometricOptimizers.OptimizerStateType

An OptimizerState is a data structure that is used to dispatch on different algorithms.

It needs to implement three methods,

initialize!(alg::OptimizerState, ::AbstractVector)
update!(alg::OptimizerState, ::AbstractVector)
solver_step!(::AbstractVector, alg::OptimizerState)

that initialize and update the state of the algorithm and perform an actual optimization step.

Further the following convenience methods should be implemented,

problem(alg::OptimizerState)
gradient(alg::OptimizerState)
hessian(alg::OptimizerState)
linesearch(alg::OptimizerState)

which return the problem to optimize, its gradient and (approximate) Hessian as well as the linesearch algorithm used in conjunction with the optimization algorithm if any.

See NewtonOptimizerState for a struct that was derived from OptimizerState.

Info

Note that a OptimizerState is not necessarily a NewtonOptimizerState as we can also have other optimizers, Adam for example.

source
GeometricOptimizers.OptimizerStatusType
OptimizerStatus

Contains residuals (relative and absolute) and various convergence properties.

This is also used in OptimizerResult.

Examples

x = ones(3)
state = NewtonOptimizerState(x)
cache = NewtonOptimizerCache(x)
f = 1.
config = Options()
OptimizerStatus(state, cache, f; config = config)

# output

 * Convergence measures

    |x - x'|               = NaN
    |x - x'|/|x'|          = NaN
    |f(x) - f(x')|         = NaN
    |f(x) - f(x')|/|f(x')| = NaN
    |g(x) - g(x')|         = NaN
    |g(x)|                 = NaN
source
GeometricOptimizers.SkewSymMatrixType
SkewSymMatrix(S::AbstractVector, n::Integer)

Instantiate a skew-symmetric matrix with information stored in vector S.

A skew-symmetric matrix $A$ is a matrix $A^T = -A$.

Internally the struct saves a vector $S$ of size $n(n-1)\div2$. The conversion is done the following way:

\[[A]_{ij} = \begin{cases} 0 & \text{if $i=j$} \\ S[( (i-2) (i-1) ) \div 2 + j] & \text{if $i>j$}\\ S[( (j-2) (j-1) ) \div 2 + i] & \text{else}. \end{cases}\]

So $S$ stores a string of vectors taken from $A$: $S = [\tilde{a}_1, \tilde{a}_2, \ldots, \tilde{a}_n]$ with $\tilde{a}_i = [[A]_{i1},[A]_{i2},\ldots,[A]_{i(i-1)}]$.

Also see SymmetricMatrix, LowerTriangular and UpperTriangular.

Examples

using GeometricOptimizers
S = [1, 2, 3, 4, 5, 6]
SkewSymMatrix(S, 4)

# output

4×4 SkewSymMatrix{Int64, Vector{Int64}}:
 0  -1  -2  -4
 1   0  -3  -5
 2   3   0  -6
 4   5   6   0
source
GeometricOptimizers.SkewSymMatrixMethod
SkewSymMatrix(A::AbstractMatrix)

Perform 0.5 * (A - A') and store the matrix in an efficient way (as a vector with $n(n-1)/2$ entries).

If the constructor is called with a matrix as input it returns a skew-symmetric matrix via the projection:

\[A \mapsto \frac{1}{2}(A - A^T).\]

Examples

using GeometricOptimizers
M = [1 2 3 4; 5 6 7 8; 9 10 11 12; 13 14 15 16]
SkewSymMatrix(M)

# output

4×4 SkewSymMatrix{Float64, Vector{Float64}}:
 0.0  -1.5  -3.0  -4.5
 1.5   0.0  -1.5  -3.0
 3.0   1.5   0.0  -1.5
 4.5   3.0   1.5   0.0

Extended help

Note that the constructor is designed in such a way that it always returns matrices of type SkewSymMatrix{<:AbstractFloat} when called with a matrix, even if this matrix is of type AbstractMatrix{<:Integer}.

If the user wishes to allocate a matrix SkewSymMatrix{<:Integer} then call:

SkewSymMatrix(::AbstractVector, n::Integer)

Note that this is different from LowerTriangular and UpperTriangular as no porjection takes place there.

source
GeometricOptimizers.StiefelLieAlgHorMatrixType
StiefelLieAlgHorMatrix(A::SkewSymMatrix, B::AbstractMatrix, N::Integer, n::Integer)

Build an instance of StiefelLieAlgHorMatrix based on a skew-symmetric matrix A and an arbitrary matrix B.

An element of StiefelLieAlgMatrix takes the form:

\[\begin{pmatrix} A & B^T \\ B & \mathbb{O} \end{pmatrix},\]

where $A$ is skew-symmetric (this is SkewSymMatrix in GeometricOptimizers).

Also see GrassmannLieAlgHorMatrix.

Extended help

StiefelLieAlgHorMatrix is the horizontal component of the Lie algebra of skew-symmetric matrices (with respect to the canonical metric).

The projection here is: $\pi:S \to SE$ where

\[E = \begin{bmatrix} \mathbb{I}_{n} \\ \mathbb{O}_{(N-n)\times{}n} \end{bmatrix}.\]

The matrix $E$ is implemented under StiefelProjection in GeometricOptimizers.

source
GeometricOptimizers.StiefelLieAlgHorMatrixMethod
StiefelLieAlgHorMatrix(D::AbstractMatrix, n::Integer)

Take a big matrix as input and build an instance of StiefelLieAlgHorMatrix.

The integer $N$ in $St(n, N)$ is the number of rows of D.

Extended help

If the constructor is called with a big $N\times{}N$ matrix, then the projection is performed the following way:

\[\begin{pmatrix} A & B_1 \\ B_2 & D \end{pmatrix} \mapsto \begin{pmatrix} \mathrm{skew}(A) & -B_2^T \\ B_2 & \mathbb{O} \end{pmatrix}.\]

The operation $\mathrm{skew}:\mathbb{R}^{n\times{}n}\to\mathcal{S}_\mathrm{skew}(n)$ is the skew-symmetrization operation. This is equivalent to calling of SkewSymMatrix with an $n\times{}n$ matrix.

This can also be seen as the operation:

\[D \mapsto \Omega(E, DE) = \mathrm{skew}\left(2 \left(\mathbb{I} - \frac{1}{2} E E^T \right) DE E^T\right).\]

Also see GeometricOptimizers.Ω.

source
GeometricOptimizers.StiefelManifoldType
StiefelManifold <: Manifold

An implementation of the Stiefel manifold [2]. The Stiefel manifold is the collection of all matrices $Y\in\mathbb{R}^{N\times{}n}$ whose columns are orthonormal, i.e.

\[ St(n, N) = \{Y: Y^TY = \mathbb{I}_n \}.\]

The Stiefel manifold can be shown to have manifold structure (as the name suggests) and this is heavily used in GeometricOptimizers. It is further a compact space. More information can be found in the docstrings for rgrad(::StiefelManifold, ::AbstractMatrix) and metric(::StiefelManifold, ::AbstractMatrix, ::AbstractMatrix).

source
GeometricOptimizers.StiefelProjectionType
StiefelProjection(backend, T, N, n)

Make a matrix of the form $\begin{bmatrix} \mathbb{I} & \mathbb{O} \end{bmatrix}^T$ for a specific backend and data type.

An array that essentially does vcat(I(n), zeros(N-n, n)) with GPU support.

Extended help

An instance of StiefelProjection should technically also belong to StiefelManifold.

source
GeometricOptimizers.StiefelProjectionMethod
StiefelProjection(A::AbstractMatrix)

Extract necessary information from A and build an instance of StiefelProjection.

Necessary information here referes to the backend, the data type and the size of the matrix.

source
GeometricOptimizers.StiefelProjectionMethod
StiefelProjection(B::AbstractLieAlgHorMatrix)

Extract necessary information from B and build an instance of StiefelProjection.

Necessary information here referes to the backend, the data type and the size of the matrix.

The size is queried through B.N and B.n.

Examples

using GeometricOptimizers
using GeometricOptimizers: StiefelProjection

B₁ = rand(StiefelLieAlgHorMatrix, 5, 2)
B₂ = rand(GrassmannLieAlgHorMatrix, 5, 2)
E = [1. 0.; 0. 1.; 0. 0.; 0. 0.; 0. 0.]

StiefelProjection(B₁) ≈ StiefelProjection(B₂) ≈ E 

# output

true
source
GeometricOptimizers.SymmetricMatrixType
SymmetricMatrix(S::AbstractVector, n::Integer)

Instantiate a symmetric matrix with information stored in vector S.

A SymmetricMatrix $A$ is a matrix $A^T = A$.

Internally the struct saves a vector $S$ of size $n(n+1)\div2$. The conversion is done the following way:

\[[A]_{ij} = \begin{cases} S[( (i-1) i ) \div 2 + j] & \text{if $i\geq{}j$}\\ S[( (j-1) j ) \div 2 + i] & \text{else}. \end{cases}\]

So $S$ stores a string of vectors taken from $A$: $S = [\tilde{a}_1, \tilde{a}_2, \ldots, \tilde{a}_n]$ with $\tilde{a}_i = [[A]_{i1},[A]_{i2},\ldots,[A]_{ii}]$.

Also see SkewSymMatrix, LowerTriangular and UpperTriangular.

Examples

using GeometricOptimizers
S = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
SymmetricMatrix(S, 4)

# output

4×4 SymmetricMatrix{Int64, Vector{Int64}}:
 1  2  4   7
 2  3  5   8
 4  5  6   9
 7  8  9  10
source
GeometricOptimizers.SymmetricMatrixMethod
SymmetricMatrix(A::AbstractMatrix)

Perform a projection and store the matrix in an efficient way (as a vector with $n(n+1)/2$ entries).

If the constructor is called with a matrix as input it returns a symmetric matrix via the projection:

\[A \mapsto \frac{1}{2}(A + A^T).\]

Examples

using GeometricOptimizers
M = [1 2 3 4; 5 6 7 8; 9 10 11 12; 13 14 15 16]
SymmetricMatrix(M)

# output

4×4 SymmetricMatrix{Float64, Vector{Float64}}:
 1.0   3.5   6.0   8.5
 3.5   6.0   8.5  11.0
 6.0   8.5  11.0  13.5
 8.5  11.0  13.5  16.0

Extended help

Note that the constructor is designed in such a way that it always returns matrices of type SymmetricMatrix{<:AbstractFloat} when called with a matrix, even if this matrix is of type AbstractMatrix{<:Integer}.

If the user wishes to allocate a matrix SymmetricMatrix{<:Integer} then call

$julia SymmetricMatrix(::AbstractVector, n::Integer)$`

Note that this is different from LowerTriangular and UpperTriangular as no porjection takes place there.

source
GeometricOptimizers.UpperTriangularType
UpperTriangular(S::AbstractVector, n::Int)

Build an upper-triangular matrix from a vector.

An upper-triangular matrix is an $n\times{}n$ matrix that has zeros on the diagonal and on the lower triangular.

The data are stored in a vector $S$ similarly to other matrices. See LowerTriangular, SkewSymMatrix and SymmetricMatrix.

The struct two fields: S and n. The first stores all the entries of the matrix in a sparse fashion (in a vector) and the second is the dimension $n$ for $A\in\mathbb{R}^{n\times{}n}$.

Examples

using GeometricOptimizers
S = [1, 2, 3, 4, 5, 6]
UpperTriangular(S, 4)

# output

4×4 UpperTriangular{Int64, Vector{Int64}}:
 0  1  2  4
 0  0  3  5
 0  0  0  6
 0  0  0  0
source
GeometricOptimizers.UpperTriangularMethod
UpperTriangular(A::AbstractMatrix)

Build an upper-triangular matrix from a matrix.

This is done by taking the upper right of that matrix.

Examples

using GeometricOptimizers
M = [1 2 3 4; 5 6 7 8; 9 10 11 12; 13 14 15 16]
UpperTriangular(M)

# output

4×4 UpperTriangular{Int64, Vector{Int64}}:
 0  2  3   4
 0  0  7   8
 0  0  0  12
 0  0  0   0
source
GeometricOptimizers._BFGSCacheType
BFGSCache(B)

Make the cache for the BFGS optimizer based on the array B.

It stores an array for the gradient of the previous time step B and the inverse of the Hessian matrix H.

The cache for the inverse of the Hessian is initialized with the idendity. The cache for the previous gradient information is initialized with the zero vector.

Note that the cache for H is changed iteratively, whereas the cache for B is newly assigned at every time step.

source
GeometricOptimizers._GradientType
Gradient(η)

Make an instance of a gradient optimizer.

This is the simplest neural network optimizer. It has no cache and computes the final velocity as:

\[ \mathrm{velocity} \gets - \eta\nabla_\mathrm{weight}L.\]

Implementation

The operations are done as memory efficiently as possible. This means the provided $\nabla_WL$ is mutated via:

rmul!(∇L, -method.η)
source
Base.:*Method
λY * Y

Apply the element λY onto Y.

Here λY is an element of a Lie group and Y is an element of a homogeneous space.

source
Base.randMethod
rand(backend, manifold_type, N, n)

Draw random elements for a specific device.

Examples

Random elements of the manifold can be allocated on GPU. Call ...

rand(CUDABackend(), StiefelManifold{Float32}, N, n)

... for drawing elements on a CUDA device.

source
Base.randMethod
rand(manifold_type, N, n)

Draw random elements from the Stiefel and the Grassmann manifold.

Because both of these manifolds are compact spaces we can sample them uniformly [4].

Examples

When we call ...

using GeometricOptimizers
using GeometricOptimizers: _round # hide
import Random
Random.seed!(123)

N, n = 5, 3
Y = rand(StiefelManifold{Float32}, N, n)
_round(Y; digits = 5) # hide

# output

5×3 StiefelManifold{Float32, Matrix{Float32}}:
 -0.27575   0.32991   0.77275
 -0.62485  -0.33224  -0.0686
 -0.69333   0.36724  -0.18988
 -0.09295  -0.73145   0.46064
  0.2102    0.33301   0.38717

... the sampling is done by first allocating a random matrix of size $N\times{}n$ via Y = randn(Float32, N, n).

We then perform a QR decomposition Q, R = qr(Y) with the qr function from the LinearAlgebra package (this is using Householder reflections internally).

The final output are then the first n columns of the Q matrix.

source
Base.vecMethod
vec(A::AbstractTriangular)

Return the associated vector to $A$.

Examples

using GeometricOptimizers

M = [1 2 3 4; 5 6 7 8; 9 10 11 12; 13 14 15 16]
LowerTriangular(M) |> vec

# output

6-element Vector{Int64}:
  5
  9
 10
 13
 14
 15
source
Base.vecMethod
vec(A)

Output the associated vector of A.

Examples

using GeometricOptimizers

M = [1 2 3 4; 5 6 7 8; 9 10 11 12; 13 14 15 16]
SkewSymMatrix(M) |> vec

# output

6-element Vector{Float64}:
 1.5
 3.0
 1.5
 4.5
 3.0
 1.5
source
Base.vecMethod
vec(A::StiefelLieAlgHorMatrix)

Vectorize A.

Examples

using GeometricOptimizers

A = SkewSymMatrix([1, ], 2)
B = [2 3; ]
B̄ = StiefelLieAlgHorMatrix(A, B, 3, 2)
B̄ |> vec

# output

vcat(1-element Vector{Int64}, 2-element Vector{Int64}):
 1
 2
 3

Implementation

This is using Vcat from the package LazyArrays.

source
GeometricBase.update!Method
update!(cache::NewtonOptimizerCache, x, g, hes)

Update an instance of NewtonOptimizerCache based on x.

This is used in update!(::OptimizerState, ::AbstractVector).

This sets:

\[\begin{aligned} % \bar{x}^\mathtt{cache} & \gets x, \\ x^\mathtt{cache} & \gets x, \\ g^\mathtt{cache} & \gets g, \\ \mathrm{rhs}^\mathtt{cache} & \gets -g, \\ H^\mathtt{cache} & \gets H(x), \\ \delta^\mathtt{cache} & \gets (H^\mathtt{cache})^{-1}\mathrm{rhs}^\mathtt{cache}, \end{aligned}\]

where we wrote $H$ for the Hessian (i.e. the input argument hes).

source
GeometricBase.update!Method
update!(state::NewtonOptimizerState, gradient, x)

Update an instance of NewtonOptimizerState based on x and gradient, where g is of type SimpleSolvers.Gradient.

Examples

If we only call update! once there are still NaNs for x̄, ḡ and f̄.

f(x) = sum(x.^2)
x = [1., 2.]
state = NewtonOptimizerState(x)
grad = GradientAutodiff{Float64}(f, length(x))
update!(state, grad, x)

# output

NewtonOptimizerState{Float64, Vector{Float64}, Vector{Float64}}(0, [1.0, 2.0], [NaN, NaN], [2.0, 4.0], [NaN, NaN], 5.0, NaN)
source
GeometricBase.update!Method
update!(o, cache, B)

Update the cache and output a final velocity that is stored in B.

Note that $B\in\mathfrak{g}^\mathrm{hor}$ in general.

In the manifold case the final velocity is the input to a retraction.

source
GeometricBase.update!Method
update!(o::Optimizer{<:BFGS}, C, B)

Peform an update with the BFGS optimizer.

C is the cache, B contains the gradient information (the output of global_rep in general).

First we compute the final velocity with

vecS = -o.method.η * C.H * vec(B)

and then we update H

C.H .= (𝕀 - ρ * SY) * C.H * (𝕀 - ρ * SY') + ρ * vecS * vecS'

where SY is vecS * Y' and 𝕀 is the idendity.

Implementation

For stability we use δ for computing ρ:

ρ = 1. / (vecS' * Y + o.method.δ)

This is similar to the Adam

Extended help

If we have weights on a Manifold than the updates are slightly more difficult. In this case the vec operation has to be generalized to the corresponding global tangent space.

source
GeometricBase.update!Method
update!(cache, x, g)

Update the BFGSCache based on x and g.

Extended help

The update rule used here can be found in [5] and [3]:

It does:

\[\begin{aligned} \delta & \gets x^{(k)} - x^{(k-1)}, \\ \gamma & \gets \nabla{}f^{(k)} - \nabla{}f^{(k-1)}, \\ T_1 & \gets \delta\gamma^TQ, \\ T_2 & \gets Q\gamma\delta^T, \\ T_3 & \gets (1 + \frac{\gamma^TQ\gamma}{\delta^\gamma})\delta\delta^T,\\ Q & \gets Q - (T_1 + T_2 - T_3)/{\delta^T\gamma} \end{aligned}\]

source
GeometricOptimizers.apply_sectionMethod
apply_section(λY::GlobalSection{T, AT}, Y₂::AT) where {T, AT <: StiefelManifold{T}}

Apply λY to Y₂.

Mathematically this is the group action of the element $\lambda{}Y\in{}G$ on the element $Y_2$ of the homogeneous space $\mathcal{M}$.

Internally it calls apply_section!.

source
GeometricOptimizers.cayleyMethod
cayley(B̄::StiefelLieAlgHorMatrix)

Compute the Cayley retraction of B.

Implementation

Internally this is using

\[\mathrm{Cayley}(\bar{B}) = \mathbb{I} + \frac{1}{2} B' (\mathbb{I}_{2n} - \frac{1}{2} (B'')^T B')^{-1} (B'')^T (\mathbb{I} + \frac{1}{2} B),\]

with

\[\bar{B} = \begin{bmatrix} A & -B^T \\ B & \mathbb{O} \end{bmatrix} = \begin{bmatrix} \frac{1}{2}A & \mathbb{I} \\ B & \mathbb{O} \end{bmatrix} \begin{bmatrix} \mathbb{I} & \mathbb{O} \\ \frac{1}{2}A & -B^T \end{bmatrix} =: B'(B'')^T,\]

i.e. $\bar{B}$ is expressed as a product of two $N\times{}2n$ matrices.

source
GeometricOptimizers.cayleyMethod
cayley(Y::Manifold, Δ)

Take as input an element of a manifold Y and a tangent vector in Δ in the corresponding tangent space and compute the Cayley retraction.

In different notation: take as input an element $x$ of $\mathcal{M}$ and an element of $T_x\mathcal{M}$ and return $\mathrm{Cayley}(v_x).$

Examples

using GeometricOptimizers

Y = StiefelManifold([1. 0. 0.;]' |> Matrix)
Δ = [0. .5 0.;]' |> Matrix
Y₂ = GeometricOptimizers.cayley(Y, Δ)

Y₂' * Y₂ ≈ [1.;]

# output

true

See the example in [geodesic(::Manifold{T}, ::AbstractMatrix{T}) where T].

source
GeometricOptimizers.geodesicMethod
geodesic(B̄::StiefelLieAlgHorMatrix)

Compute the geodesic of an element in StiefelLieAlgHorMatrix.

Implementation

Internally this is using:

\[\mathbb{I} + B'\mathfrak{A}(B', B'')B'',\]

with

\[\bar{B} = \begin{bmatrix} A & -B^T \\ B & \mathbb{O} \end{bmatrix} = \begin{bmatrix} \frac{1}{2}A & \mathbb{I} \\ B & \mathbb{O} \end{bmatrix} \begin{bmatrix} \mathbb{I} & \mathbb{O} \\ \frac{1}{2}A & -B^T \end{bmatrix} =: B'(B'')^T.\]

This is using a computationally efficient version of the matrix exponential $\mathfrak{A}$.

See GeometricOptimizers.𝔄.

source
GeometricOptimizers.geodesicMethod
geodesic(Y::Manifold, Δ)

Take as input an element of a manifold Y and a tangent vector in Δ in the corresponding tangent space and compute the geodesic (exponential map).

In different notation: take as input an element $x$ of $\mathcal{M}$ and an element of $T_x\mathcal{M}$ and return $\mathtt{geodesic}(x, v_x) = \exp(v_x).$

Examples

using GeometricOptimizers

Y = StiefelManifold([1. 0. 0.;]' |> Matrix)
Δ = [0. .5 0.;]' |> Matrix
Y₂ = GeometricOptimizers.geodesic(Y, Δ)

Y₂' * Y₂ ≈ [1.;]

# output

true

Implementation

Internally this geodesic method calls geodesic(::StiefelLieAlgHorMatrix).

source
GeometricOptimizers.global_repMethod
global_rep(λY::GlobalSection{T, AT}, Δ::AbstractMatrix{T}) where {T, AT<:GrassmannManifold{T}}

Express Δ (an element of the tangent space of Y) as an instance of GrassmannLieAlgHorMatrix.

The method global_rep for GrassmannManifold is similar to that for StiefelManifold.

Examples

using GeometricOptimizers
using GeometricOptimizers: _round
import Random 

Random.seed!(123)

Y = rand(GrassmannManifold, 6, 3)
Δ = rgrad(Y, randn(6, 3))
λY = GlobalSection(Y)

_round(global_rep(λY, Δ); digits = 3)

# output

6×6 GrassmannLieAlgHorMatrix{Float64, Matrix{Float64}}:
  0.0     0.0     0.0     0.981  -2.058   0.4
  0.0     0.0     0.0    -0.424   0.733  -0.919
  0.0     0.0     0.0    -1.815   1.409   1.085
 -0.981   0.424   1.815   0.0     0.0     0.0
  2.058  -0.733  -1.409   0.0     0.0     0.0
 -0.4     0.919  -1.085   0.0     0.0     0.0
source
GeometricOptimizers.global_repMethod
global_rep(λY::GlobalSection{T, AT}, Δ::AbstractMatrix{T}) where {T, AT<:StiefelManifold{T}}

Express Δ (an the tangent space of Y) as an instance of StiefelLieAlgHorMatrix.

This maps an element from $T_Y\mathcal{M}$ to an element of $\mathfrak{g}^\mathrm{hor}$.

These two spaces are isomorphic where the isomorphism where the isomorphism is established through $\lambda(Y)\in{}G$ via:

\[T_Y\mathcal{M} \to \mathfrak{g}^{\mathrm{hor}}, \Delta \mapsto \lambda(Y)^{-1}\Omega(Y, \Delta)\lambda(Y).\]

Also see GeometricOptimizers.Ω.

Examples

using GeometricOptimizers
using GeometricOptimizers: _round
import Random 

Random.seed!(123)

Y = rand(StiefelManifold, 6, 3)
Δ = rgrad(Y, randn(6, 3))
λY = GlobalSection(Y)

_round(global_rep(λY, Δ); digits = 3)

# output

6×6 StiefelLieAlgHorMatrix{Float64, SkewSymMatrix{Float64, Vector{Float64}}, Matrix{Float64}}:
  0.0     0.679   1.925   0.981  -2.058   0.4
 -0.679   0.0     0.298  -0.424   0.733  -0.919
 -1.925  -0.298   0.0    -1.815   1.409   1.085
 -0.981   0.424   1.815   0.0     0.0     0.0
  2.058  -0.733  -1.409   0.0     0.0     0.0
 -0.4     0.919  -1.085   0.0     0.0     0.0

Implementation

The function global_rep does in fact not perform the entire map $\lambda(Y)^{-1}\Omega(Y, \Delta)\lambda(Y)$ but only

\[\Delta \mapsto \mathrm{skew}(Y^T\Delta),\]

to get the small skew-symmetric matrix $A\in\mathcal{S}_\mathrm{skew}(n)$ and

\[\Delta \mapsto (\lambda(Y)_{[1:N, n:N]}^T \Delta)_{[1:(N-n), 1:n]},\]

to get the arbitrary matrix $B\in\mathbb{R}^{(N-n)\times{}n}$.

source
GeometricOptimizers.global_sectionMethod
global_section(Y::StiefelManifold)

Compute a matrix of size $N\times(N-n)$ whose columns are orthogonal to the columns in Y.

This matrix is also called $Y_\perp$ [68].

Examples

using GeometricOptimizers
using GeometricOptimizers: global_section
import Random

Random.seed!(123)

Y = StiefelManifold([1. 0.; 0. 1.; 0. 0.; 0. 0.])

round.(Matrix(global_section(Y)); digits = 3)

# output

4×2 Matrix{Float64}:
 0.0    -0.0
 0.0     0.0
 0.936  -0.353
 0.353   0.936

Further note that we convert the QRCompactWYQ object to a Matrix before we display it.

Implementation

The implementation is done with a QR decomposition (LinearAlgebra.qr!). Internally we do:

A = randn(N, N - n) # or the gpu equivalent
A = A - Y.A * (Y.A' * A)
qr!(A).Q
source
GeometricOptimizers.meets_stopping_criteriaMethod
meets_stopping_criteria(status, config, iterations)

Check if the optimizer has converged.

Implementation

meets_stopping_criteria checks if one of the following is true:

  • converged (the output of SimpleSolvers.assess_convergence) is true and iterations $\geq$ config.min_iterations,
  • if config.allow_f_increases is false: status.f_increased is true,
  • iterations $\geq$ config.max_iterations,
  • status.rxₐ $>$ config.x_abstol_break
  • status.rxᵣ $>$ config.x_reltol_break
  • status.rfₐ $>$ config.f_abstol_break
  • status.rfᵣ $>$ config.f_reltol_break
  • status.rg $>$ config.g_restol_break
  • status.x_isnan
  • status.f_isnan
  • status.g_isnan
source
GeometricOptimizers.metricMethod
metric(Y::GrassmannManifold, Δ₁::AbstractMatrix, Δ₂::AbstractMatrix)

Compute the metric for vectors Δ₁ and Δ₂ at Y.

The representation of the Grassmann manifold is realized as a quotient space of the Stiefel manifold.

The metric for the Grassmann manifold is:

\[g^{Gr}_Y(\Delta_1, \Delta_2) = g^{St}_Y(\Delta_1, \Delta_2) = \mathrm{Tr}(\Delta_1^T (\mathbb{I} - Y Y^T) \Delta_2) = \mathrm{Tr}(\Delta_1^T \Delta_2),\]

where we used that $Y^T\Delta_i$ for $i = 1, 2.$

source
GeometricOptimizers.metricMethod
metric(Y::StiefelManifold, Δ₁::AbstractMatrix, Δ₂::AbstractMatrix)

Compute the dot product for Δ₁ and Δ₂ at Y.

This uses the canonical Riemannian metric for the Stiefel manifold:

\[g_Y: (\Delta_1, \Delta_2) \mapsto \mathrm{Tr}(\Delta_1^T(\mathbb{I} - \frac{1}{2}YY^T)\Delta_2).\]

source
GeometricOptimizers.optimization_step!Method
optimization_step!(o, λY, ps, dx)

Update the weights ps based on an EuclideanOptimizer, a cache and first-order derivatives dx.

optimization_step! is calling update! internally. update! has to be implemented for every OptimizerMethod.

Arguments

All arguments into optimization_step! are mandatory:

  1. o::Optimizer,
  2. λY::NamedTuple: this named tuple has the same keys as ps, but contains GlobalSections,
  3. ps::NamedTuple: the neural network parameters,
  4. dx::NamedTuple: the gradients stores as a NamedTuple.

All the arguments are given as NamedTuples as the neural network weights are stores in that format.

using GeometricOptimizers
using GeometricOptimizers: MomentumCache, Momentum, apply_toNT, geodesic, optimization_step!

ps = (weight = rand(StiefelManifold{Float32}, 5, 3), )
cache = apply_toNT(MomentumCache, ps)
o = Optimizer(Momentum(), cache, 0, geodesic)
λY = GlobalSection(ps)
dx = (weight = rand(Float32, 5, 3), )

# call the optimizer
optimization_step!(o, λY, ps, dx)

_test_nt(x) = typeof(x) <: NamedTuple

_test_nt(λY) & _test_nt(ps) & _test_nt(cache) & _test_nt(dx)

# output

true

Extended help

The derivatives dx here are usually obtained via an AD routine by differentiating a loss function, i.e. dx is $\nabla_xL$.

source
GeometricOptimizers.rgradMethod
rgrad(Y::GrassmannManifold, ∇L::AbstractMatrix)

Compute the Riemannian gradient for the Grassmann manifold at Y based on ∇L.

Here $Y$ is a representation of $\mathrm{span}(Y)\in{}Gr(n, N)$ and $\nabla{}L\in\mathbb{R}^{N\times{}n}$ is the Euclidean gradient.

This gradient has the property that it is orthogonal to the space spanned by $Y$.

The precise form of the mapping is:

\[\mathtt{rgrad}(Y, \nabla{}L) \mapsto \nabla{}L - YY^T\nabla{}L.\]

Note the property $Y^T\mathrm{rgrad}(Y, \nabla{}L) = \mathbb{O}.$

Also see rgrad(::StiefelManifold, ::AbstractMatrix).

Examples

using GeometricOptimizers

Y = GrassmannManifold([1 0 ; 0 1 ; 0 0; 0 0])
Δ = [1 2; 3 4; 5 6; 7 8]
rgrad(Y, Δ)

# output

4×2 Matrix{Int64}:
 0  0
 0  0
 5  6
 7  8
source
GeometricOptimizers.rgradMethod
rgrad(Y::StiefelManifold, ∇L::AbstractMatrix)

Compute the Riemannian gradient for the Stiefel manifold at Y based on ∇L.

Here $Y\in{}St(N,n)$ and $\nabla{}L\in\mathbb{R}^{N\times{}n}$ is the Euclidean gradient.

The function computes the Riemannian gradient with respect to the canonical metric: metric(::StiefelManifold, ::AbstractMatrix, ::AbstractMatrix).

The precise form of the mapping is:

\[\mathtt{rgrad}(Y, \nabla{}L) \mapsto \nabla{}L - Y(\nabla{}L)^TY\]

Note the property $Y^T\mathtt{rgrad}(Y, \nabla{}L)\in\mathcal{S}_\mathrm{skew}(n).$

Examples

using GeometricOptimizers

Y = StiefelManifold([1 0 ; 0 1 ; 0 0; 0 0])
Δ = [1 2; 3 4; 5 6; 7 8]
rgrad(Y, Δ)

# output

4×2 Matrix{Int64}:
 0  -1
 1   0
 5   6
 7   8
source
GeometricOptimizers.solver_step!Method
solver_step!(x, state, opt)

Compute a full iterate for an EuclideanOptimizer.

Info

This also performs a line search.

Examples

julia> f(x) = sum(x .^ 2 + x .^ 3 / 3);

julia> x = [1f0, 2f0]
2-element Vector{Float32}:
 1.0
 2.0

julia> opt = EuclideanOptimizer(x, f; algorithm = Newton());

julia> state = NewtonOptimizerState(x);

julia> update!(state, gradient(opt), x);

julia> solver_step!(x, state, opt)
2-element Vector{Float32}:
 0.25
 0.6666666
source
GeometricOptimizers.ΩMethod
Ω(Y::GrassmannManifold{T}, Δ::AbstractMatrix{T}) where T

Perform the canonical horizontal lift for the Grassmann manifold:

\[ \Delta \mapsto \Omega^{St}(\Delta),\]

where $\Omega^{St}$ is the canonical horizontal lift for the Stiefel manifold.

using GeometricOptimizers
using GeometricOptimizers: StiefelProjection

E = GrassmannManifold(StiefelProjection(5, 2))
Δ = [0. 0.; 0. 0.; 2. 3.; 4. 5.; 6. 7.]
GeometricOptimizers.Ω(E, Δ)

# output

5×5 SkewSymMatrix{Float64, Vector{Float64}}:
 0.0  -0.0  -2.0  -4.0  -6.0
 0.0   0.0  -3.0  -5.0  -7.0
 2.0   3.0   0.0  -0.0  -0.0
 4.0   5.0   0.0   0.0  -0.0
 6.0   7.0   0.0   0.0   0.0
source
GeometricOptimizers.ΩMethod
Ω(Y::StiefelManifold{T}, Δ::AbstractMatrix{T}) where T

Perform canonical horizontal lift for the Stiefel manifold:

\[ \Delta \mapsto (\mathbb{I} - \frac{1}{2}YY^T)\Delta{}Y^T - Y\Delta^T(\mathbb{I} - \frac{1}{2}YY^T).\]

Internally this performs

SkewSymMatrix(2 * (I(n) - .5 * Y * Y') * Δ * Y')

It uses SkewSymMatrix to save memory.

Examples

using GeometricOptimizers
using GeometricOptimizers: StiefelProjection

E = StiefelManifold(StiefelProjection(5, 2))
Δ = [0. -1.; 1. 0.; 2. 3.; 4. 5.; 6. 7.]
GeometricOptimizers.Ω(E, Δ)

# output

5×5 SkewSymMatrix{Float64, Vector{Float64}}:
 0.0  -1.0  -2.0  -4.0  -6.0
 1.0   0.0  -3.0  -5.0  -7.0
 2.0   3.0   0.0  -0.0  -0.0
 4.0   5.0   0.0   0.0  -0.0
 6.0   7.0   0.0   0.0   0.0

Note that the output of Ω is a skew-symmetric matrix, i.e. an element of $\mathfrak{g}$.

source
GeometricOptimizers.𝔄Method
𝔄(B̂, B̄)

Compute $\mathfrak{A}(B', B'') := \sum_{n=1}^\infty \frac{1}{n!} ((B'')^TB')^{n-1}.$

This expression has the property $\mathbb{I} + B'\mathfrak{A}(B', B'')(B'')^T = \exp(B'(B'')^T).$

Examples

using GeometricOptimizers
using GeometricOptimizers: 𝔄
import Random
Random.seed!(123)

B = rand(StiefelLieAlgHorMatrix, 10, 2)
B̂ = hcat(vcat(.5 * B.A, B.B), vcat(one(B.A), zero(B.B)))
B̄ = hcat(vcat(one(B.A), zero(B.B)), vcat(-.5 * B.A, -B.B))

one(B̂ * B̄') + B̂ * 𝔄(B̂, B̄) * B̄' ≈ exp(Matrix(B))

# output

true
source
GeometricOptimizers.𝔄Method
𝔄(A)

Compute $\mathfrak{A}(A) := \sum_{n=1}^\infty \frac{1}{n!} (A)^{n-1}.$

Implementation

This uses a Taylor expansion that iteratively adds terms with

while norm(Aⁿ) > ε
mul!(A_temp, Aⁿ, A)
Aⁿ .= A_temp
rmul!(Aⁿ, T(inv(n)))

𝔄A += Aⁿ
n += 1 
end

until the norm of Aⁿ becomes smaller than machine precision. The counter n in the above algorithm is initialized as 2 The matrices Aⁿ and 𝔄 are initialized as the identity matrix.

source
SimpleSolvers.linesearch_problemMethod
linesearch_problem(problem, cache)

Create SimpleSolvers.LinesearchProblem for linesearch algorithm.

The variable on which this problem depends is $\alpha$.

Example

julia> x = [1, 0., 0.]
3-element Vector{Float64}:
 1.0
 0.0
 0.0

julia> f = x -> sum(x .^ 3 / 6 + x .^ 2 / 2);

julia> obj = OptimizerProblem(f, x);

julia> grad = GradientAutodiff{Float64}(obj.F, length(x));

julia> hess = HessianAutodiff{Float64}(obj.F, length(x));

julia> cache = NewtonOptimizerCache(x);

julia> state = NewtonOptimizerState(x); update!(state, grad, x);

julia> params = (x = state.x,);

julia> update!(cache, state, grad, hess, x);

julia> ls_obj = linesearch_problem(obj, grad, cache);

julia> ls_obj.F(0., params)
0.6666666666666666

julia> ls_obj.D(0., params)
-1.125
Info

Note that in the example above calling update! on the NewtonOptimizerCache requires a SimpleSolvers.Hessian.

source
SimpleSolvers.solve!Method
solve!(x, state, opt)

Solve the optimization problem described by opt::EuclideanOptimizer and store the result in x.

Examples

julia> f(x) = sum(x .^ 2 + x .^ 3 / 3);

julia> x = [1f0, 2f0]
2-element Vector{Float32}:
 1.0
 2.0

julia> opt = EuclideanOptimizer(x, f; algorithm = Newton());

julia> state = NewtonOptimizerState(x);

julia> solve!(x, state, opt)
GeometricOptimizers.OptimizerResult{Float32, Float32, Vector{Float32}, GeometricOptimizers.OptimizerStatus{Float32, Float32}}( * Convergence measures

    |x - x'|               = 7.82e-03
    |x - x'|/|x'|          = 2.56e+02
    |f(x) - f(x')|         = 6.18e-05
    |f(x) - f(x')|/|f(x')| = 6.63e+04
    |g(x) - g(x')|         = 1.57e-02
    |g(x)|                 = 6.10e-05
, Float32[4.6478817f-8, 3.0517578f-5], 9.313341f-10)

julia> x
2-element Vector{Float32}:
 4.6478817f-8
 3.0517578f-5

julia> iteration_number(state)
4

Also see solver_step!.

source