GeometricOptimizers

Base.Matrix
GeometricOptimizers.AbstractCache
GeometricOptimizers.AbstractLieAlgHorMatrix
GeometricOptimizers.AbstractOptimizerProblem
GeometricOptimizers.AbstractTriangular
GeometricOptimizers.Adam
GeometricOptimizers.AdamCache
GeometricOptimizers.AdamWithDecay
GeometricOptimizers.BFGS
GeometricOptimizers.BFGSCache
GeometricOptimizers.BFGSState
GeometricOptimizers.DFPCache
GeometricOptimizers.DFPState
GeometricOptimizers.EuclideanOptimizer
GeometricOptimizers.EuclideanOptimizerMethod
GeometricOptimizers.GlobalSection
GeometricOptimizers.GradientCache
GeometricOptimizers.GrassmannLieAlgHorMatrix
GeometricOptimizers.GrassmannLieAlgHorMatrix
GeometricOptimizers.GrassmannManifold
GeometricOptimizers.HessianBFGS
GeometricOptimizers.HessianDFP
GeometricOptimizers.IterativeHessian
GeometricOptimizers.LowerTriangular
GeometricOptimizers.LowerTriangular
GeometricOptimizers.Manifold
GeometricOptimizers.Momentum
GeometricOptimizers.MomentumCache
GeometricOptimizers.NewtonOptimizerCache
GeometricOptimizers.NewtonOptimizerState
GeometricOptimizers.Optimizer
GeometricOptimizers.Optimizer
GeometricOptimizers.OptimizerCache
GeometricOptimizers.OptimizerMethod
GeometricOptimizers.OptimizerProblem
GeometricOptimizers.OptimizerResult
GeometricOptimizers.OptimizerState
GeometricOptimizers.OptimizerStatus
GeometricOptimizers.QuasiNewtonOptimizerMethod
GeometricOptimizers.SkewSymMatrix
GeometricOptimizers.SkewSymMatrix
GeometricOptimizers.StiefelLieAlgHorMatrix
GeometricOptimizers.StiefelLieAlgHorMatrix
GeometricOptimizers.StiefelManifold
GeometricOptimizers.StiefelProjection
GeometricOptimizers.StiefelProjection
GeometricOptimizers.StiefelProjection
GeometricOptimizers.SymmetricMatrix
GeometricOptimizers.SymmetricMatrix
GeometricOptimizers.UpperTriangular
GeometricOptimizers.UpperTriangular
GeometricOptimizers._BFGS
GeometricOptimizers._BFGSCache
GeometricOptimizers._DFP
GeometricOptimizers._Gradient
Base.:*
Base.rand
Base.rand
Base.vec
Base.vec
Base.vec
GeometricBase.update!
GeometricBase.update!
GeometricBase.update!
GeometricBase.update!
GeometricBase.update!
GeometricBase.update!
GeometricBase.update!
GeometricBase.update!
GeometricOptimizers.apply_section
GeometricOptimizers.apply_section!
GeometricOptimizers.cayley
GeometricOptimizers.cayley
GeometricOptimizers.cayley
GeometricOptimizers.convergence_measures
GeometricOptimizers.geodesic
GeometricOptimizers.geodesic
GeometricOptimizers.geodesic
GeometricOptimizers.global_rep
GeometricOptimizers.global_rep
GeometricOptimizers.global_section
GeometricOptimizers.global_section
GeometricOptimizers.gradient
GeometricOptimizers.gradient
GeometricOptimizers.gradient
GeometricOptimizers.init_optimizer_cache
GeometricOptimizers.isaOptimizerState
GeometricOptimizers.meets_stopping_criteria
GeometricOptimizers.metric
GeometricOptimizers.metric
GeometricOptimizers.optimization_step!
GeometricOptimizers.rgrad
GeometricOptimizers.rgrad
GeometricOptimizers.rhs
GeometricOptimizers.rhs
GeometricOptimizers.rhs
GeometricOptimizers.solver_step!
GeometricOptimizers.value
GeometricOptimizers.Ω
GeometricOptimizers.Ω
GeometricOptimizers.𝔄
GeometricOptimizers.𝔄
SimpleSolvers.direction
SimpleSolvers.direction
SimpleSolvers.direction
SimpleSolvers.linesearch_problem
SimpleSolvers.solve!

Base.Matrix — Method

Matrix(λY::GlobalSection)

Put λY into matrix form.

This is not recommended if speed is important!

Use apply_section and global_rep instead!

source

GeometricOptimizers.AbstractCache — Type

AbstractCache

AbstractCache has subtypes: AdamCache, MomentumCache, GradientCache and _BFGSCache.

All of them can be initialized with providing an array (also supporting manifold types).

source

GeometricOptimizers.AbstractLieAlgHorMatrix — Type

AbstractLieAlgHorMatrix <: AbstractMatrix

AbstractLieAlgHorMatrix is a supertype for various horizontal components of Lie algebras. We usually call this $\mathfrak{g}^\mathrm{hor}$.

See StiefelLieAlgHorMatrix and GrassmannLieAlgHorMatrix for concrete examples.

source

GeometricOptimizers.AbstractOptimizerProblem — Type

AbstractOptimizerProblem

See SimpleSolvers.LinesearchProblem and OptimizerProblem.

source

GeometricOptimizers.AbstractTriangular — Type

AbstractTriangular

See UpperTriangular and LowerTriangular.

source

GeometricOptimizers.Adam — Type

Adam(η, ρ₁, ρ₂, δ)

Make an instance of the Adam Optimizer.

Here the cache consists of first and second moments that are updated as

\[B_1 \gets ((\rho_1 - \rho_1^t)/(1 - \rho_1^t))\cdot{}B_1 + (1 - \rho_1)/(1 - \rho_1^t)\cdot{}\nabla{}L,\]

and

\[B_2 \gets ((\rho_2 - \rho_1^t)/(1 - \rho_2^t))\cdot{}B_2 + (1 - \rho_2)/(1 - \rho_2^t)\cdot\nabla{}L\odot\nabla{}L.\]

The final velocity is computed as:

\[\mathrm{velocity} \gets -\eta{}B_1/\sqrt{B_2 + \delta}.\]

Implementation

The velocity is stored in the input to save memory:

mul!(B, -o.method.η, /ᵉˡᵉ(C.B₁, scalar_add(racᵉˡᵉ(C.B₂), o.method.δ)))

where B is the input to the [update!] function.

The algorithm and suggested defaults are taken from [1, page 301].

source

GeometricOptimizers.AdamCache — Type

AdamCache(Y)

Store the first and second moment for Y (initialized as zeros).

First and second moments are called B₁ and B₂.

If the cache is called with an instance of a homogeneous space, e.g. the StiefelManifold $St(n,N)$ it initializes the moments as elements of $\mathfrak{g}^\mathrm{hor}$ (StiefelLieAlgHorMatrix).

Examples

using GeometricOptimizers

Y = rand(StiefelManifold, 5, 3)
GeometricOptimizers.AdamCache(Y).B₁

# output

5×5 StiefelLieAlgHorMatrix{Float64, SkewSymMatrix{Float64, Vector{Float64}}, Matrix{Float64}}:
 0.0  -0.0  -0.0  -0.0  -0.0
 0.0   0.0  -0.0  -0.0  -0.0
 0.0   0.0   0.0  -0.0  -0.0
 0.0   0.0   0.0   0.0   0.0
 0.0   0.0   0.0   0.0   0.0

source

GeometricOptimizers.AdamWithDecay — Type

AdamWithDecay(n_epochs, η₁=1f-2, η₂=1f-6, ρ₁=9f-1, ρ₂=9.9f-1, δ=1f-8)

Make an instance of the Adam Optimizer with weight decay.

All except the first argument (the number of epochs) have defaults.

The difference to the standard Adam is that we change the learning reate $\eta$ in each step. Apart from the time dependency of $\eta$ the two algorithms are however equivalent. $\eta(0)$ starts with a high value $\eta_1$ and then exponentially decrease until it reaches $\eta_2$ with

\[ \eta(t) = \gamma^t\eta_1,\]

where $\gamma = \exp(\log(\eta_1 / \eta_2) / \mathtt{n\_epochs}).$

source

GeometricOptimizers.BFGS — Type

BFGS(η, δ)

Make an instance of the Broyden-Fletcher-Goldfarb-Shanno (BFGS) optimizer.

η is the learning rate. δ is a stabilization parameter.

source

GeometricOptimizers.BFGSCache — Type

BFGSCache

The OptimizerCache for the _BFGS algorithm. Also see update!(::BFGSCache, ::OptimizerState, ::AbstractVector, ::AbstractVector).

source

GeometricOptimizers.BFGSState — Type

BFGSState <: OptimizerState

The OptimizerState corresponding to the _BFGS method.

Keys

x̄
ḡ
f̄
Q

source

GeometricOptimizers.DFPCache — Type

DFPCache <: OptimizerCache

The OptimizerCache corresponding to the _DFP method.

source

GeometricOptimizers.DFPState — Type

DFPState

This is equivalent to BFGSState.

source

GeometricOptimizers.EuclideanOptimizer — Type

EuclideanOptimizer

The optimizer that stores all the information needed for an optimization problem.

This problem can be solved by calling solve!(::AbstractVector, ::Optimizer).

Keys

Examples

F(x) = sum(sin.(x) .^ 2)
x = ones(3)
algorithm = Newton()
state = OptimizerState(algorithm, x)
optimizer = EuclideanOptimizer(x, F; algorithm = algorithm, linesearch = Bisection())

solve!(x, state, optimizer)
x

# output

3-element Vector{Float64}:
 1.1102230246251565e-16
 1.1102230246251565e-16
 1.1102230246251565e-16

We note that this same problem may have trouble converging with other line searches:

x = ones(3)
algorithm = Newton()
state = OptimizerState(algorithm, x)
optimizer = EuclideanOptimizer(x, F; algorithm = algorithm, linesearch = Backtracking())

solve!(x, state, optimizer)
x

# output

3-element Vector{Float64}:
 1.0
 1.0
 1.0

source

GeometricOptimizers.EuclideanOptimizerMethod — Type

EuclideanOptimizerMethod <: SolverMethod

The EuclideanOptimizerMethod is used in EuclideanOptimizer and determines the algorithm that is used.

source

GeometricOptimizers.GlobalSection — Type

GlobalSection(Y)

Construct a global section for Y.

A global section $\lambda$ is a mapping from a homogeneous space $\mathcal{M}$ to the corresponding Lie group $G$ such that

\[\lambda(Y)E = Y,\]

Also see apply_section and global_rep.

Implementation

For an implementation of GlobalSection for a custom array (especially manifolds), the function global_section has to be generalized.

source

GeometricOptimizers.GradientCache — Type

GradientCache(Y)

Do not store anything.

The cache for the SimpleSolvers.Gradient does not consider past information.

source

GeometricOptimizers.GrassmannLieAlgHorMatrix — Type

GrassmannLieAlgHorMatrix(B::AbstractMatrix, N::Integer, n::Integer)

Build an instance of GrassmannLieAlgHorMatrix based on an arbitrary matrix B of size $(N-n)\times{}n$.

GrassmannLieAlgHorMatrix is the horizontal component of the Lie algebra of skew-symmetric matrices (with respect to the canonical metric).

Extended help

The projection here is: $\pi:S \to SE/\sim$ where

\[E = \begin{bmatrix} \mathbb{I}_{n} \\ \mathbb{O}_{(N-n)\times{}n} \end{bmatrix},\]

and the equivalence relation is

\[V_1 \sim V_2 \iff \exists A\in\mathcal{S}_\mathrm{skew}(n) \text{ such that } V_2 = V_1 + \begin{bmatrix} A \\ \mathbb{O} \end{bmatrix}\]

An element of GrassmannLieAlgMatrix takes the form:

\[\begin{pmatrix} \bar{\mathbb{O}} & B^T \\ B & \mathbb{O} \end{pmatrix},\]

where $\bar{\mathbb{O}}\in\mathbb{R}^{n\times{}n}$ and $\mathbb{O}\in\mathbb{R}^{(N - n)\times(N-n)}.$

source

GeometricOptimizers.GrassmannLieAlgHorMatrix — Method

GrassmannLieAlgHorMatrix(D::AbstractMatrix, n::Integer)

Take a big matrix as input and build an instance of GrassmannLieAlgHorMatrix.

The integer $N$ in $Gr(n, N)$ here is the number of rows of D.

Extended help

If the constructor is called with a big $N\times{}N$ matrix, then the projection is performed the following way:

\[\begin{pmatrix} A & B_1 \\ B_2 & D \end{pmatrix} \mapsto \begin{pmatrix} \bar{\mathbb{O}} & -B_2^T \\ B_2 & \mathbb{O} \end{pmatrix}.\]

This can also be seen as the operation:

\[D \mapsto \Omega(E, DE - EE^TDE),\]

where $\Omega$ is the horizontal lift GeometricOptimizers.Ω.

source

GeometricOptimizers.GrassmannManifold — Type

GrassmannManifold <: Manifold

The GrassmannManifold is based on the StiefelManifold.

source

GeometricOptimizers.HessianBFGS — Type

HessianBFGS <: Hessian

A struct derived from SimpleSolvers.Hessian to be used for an EuclideanOptimizer.

source

GeometricOptimizers.HessianDFP — Type

HessianDFP <: Hessian

The SimpleSolvers.Hessian corresponding to the _DFP method.

source

GeometricOptimizers.IterativeHessian — Type

IterativeHessian <: Hessian

An abstract type derived from SimpleSolvers.Hessian. Its main purpose is defining a supertype that encompasses HessianBFGS and HessianDFP for dispatch.

source

GeometricOptimizers.LowerTriangular — Type

LowerTriangular(S::AbstractVector, n::Int)

Build a lower-triangular matrix from a vector.

A lower-triangular matrix is an $n\times{}n$ matrix that has zeros on the diagonal and on the upper triangular.

The data are stored in a vector $S$ similarly to other matrices. See UpperTriangular, SkewSymMatrix and SymmetricMatrix.

The struct two fields: S and n. The first stores all the entries of the matrix in a sparse fashion (in a vector) and the second is the dimension $n$ for $A\in\mathbb{R}^{n\times{}n}$.

Examples

using GeometricOptimizers
S = [1, 2, 3, 4, 5, 6]
LowerTriangular(S, 4)

# output

4×4 LowerTriangular{Int64, Vector{Int64}}:
 0  0  0  0
 1  0  0  0
 2  3  0  0
 4  5  6  0

source

GeometricOptimizers.LowerTriangular — Method

LowerTriangular(A::AbstractMatrix)

Build a lower-triangular matrix from a matrix.

This is done by taking the lower left of that matrix.

Examples

using GeometricOptimizers
M = [1 2 3 4; 5 6 7 8; 9 10 11 12; 13 14 15 16]
LowerTriangular(M)

# output

4×4 LowerTriangular{Int64, Vector{Int64}}:
  0   0   0  0
  5   0   0  0
  9  10   0  0
 13  14  15  0

source

GeometricOptimizers.Manifold — Type

Manifold <: AbstractMatrix

A manifold in GeometricOptimizers is a sutype of AbstractMatrix. All manifolds are matrix manifolds and therefore stored as matrices. More details can be found in the docstrings for the StiefelManifold and the GrassmannManifold.

source

GeometricOptimizers.Momentum — Type

Momentum(η, α)

Make an instance of the momentum optimizer.

The momentum optimizer is similar to the SimpleSolvers.Gradient. It however has a nontrivial cache that stores past history (see MomentumCache). The cache is updated via:

\[ B^{\mathrm{cache}} \gets \alpha{}B^{\mathrm{cache}} + \nabla_\mathrm{weights}L\]

and then the final velocity is computed as

\[ \mathrm{velocity} \gets - \eta{}B^{\mathrm{cache}}.\]

Implementation

To save memory the velocity is stored in the input $\nabla_WL$. This is similar to the case of the SimpleSolvers.Gradient.

source

GeometricOptimizers.MomentumCache — Type

MomentumCache(Y)

Store the moment for Y (initialized as zeros).

The moment is called B.

If the cache is called with an instance of a Manifold it initializes the moments as elements of $\mathfrak{g}^\mathrm{hor}$ (AbstractLieAlgHorMatrix).

See AdamCache.

source

GeometricOptimizers.NewtonOptimizerCache — Type

NewtonOptimizerCache <: OptimizerCache

Keys

x: current iterate (this stores the guess called by the functions generated with linesearch_problem),
Δx: direction of optimization step (difference between x and x̄); this is obtained by multiplying rhs with the inverse of the Hessian,
g: gradient value (this stores the gradient associated with x called by the derivative part of linesearch_problem),
Δg: gradient difference (difference between g and ḡ); this is used for computing the OptimizerStatus,
rhs: the right hand side used to compute the update,
H: the Hessian matrix evaluated at x,

Also compare this to SimpleSolvers.NonlinearSolverCache.

source

GeometricOptimizers.NewtonOptimizerState — Type

NewtonOptimizerState <: OptimizerState

The optimizer state is needed to update the EuclideanOptimizer. This is different from OptimizerStatus and OptimizerResult which serve as diagnostic tools.

We note that this is also used for the _BFGS and the _DFP optimizer.

Keys

x
x̄
g
ḡ
f̄
f̄

source

GeometricOptimizers.Optimizer — Type

Optimizer(method, cache, step, retraction)

Store the method (e.g. Adam with corresponding hyperparameters), the cache (e.g. AdamCache), the optimization step and the retraction.

It takes as input an optimization method and the parameters of a network.

Before one can call Optimizer a OptimizerMethod that stores all the hyperparameters of the optimizer needs to be specified.

source

GeometricOptimizers.Optimizer — Method

Optimizer(method, nn_params)

Allocate the cache for a specific method and nn_params for an instance of Optimizer.

Internally this calls init_optimizer_cache.

An equivalent constructor is

Optimizer(method, nn::NeuralNetwork)

Arguments

The optional keyword argument is the retraction. By default this is cayley.

source

GeometricOptimizers.OptimizerCache — Type

OptimizerCache

See e.g. NewtonOptimizerCache and BFGSCache.

source

GeometricOptimizers.OptimizerMethod — Type

OptimizerMethod

Each Optimizer has to be called with an OptimizerMethod. This specifies how the neural network weights are updated in each optimization step.

source

GeometricOptimizers.OptimizerProblem — Type

OptimizerProblem <: AbstractOptimizerProblem

Used in EuclideanOptimizer. Also compare this to SimpleSolvers.NonlinearProblem.

Examples

julia> x = ones(3); F(x) = sum(sin.(x) .^ 2)
F (generic function with 1 method)

julia> OptimizerProblem(F, x)
OptimizerProblem{Float64, typeof(F), Missing, Missing}(F, missing, missing)

Info

If OptimizerProblem is called on a single function, the fields for SimpleSolvers.Gradient and SimpleSolvers.Hessian are missing.

source

GeometricOptimizers.OptimizerResult — Type

OptimizerResult

Serves as a diagnostic tool for the EuclideanOptimizer and is the return argument of solve!.

Keys

status::OptimizerStatus: current status of the optimization,
x: solution,
f: function value at solution.

source

GeometricOptimizers.OptimizerState — Type

An OptimizerState is a data structure that is used to dispatch on different algorithms.

It needs to implement three methods,

initialize!(alg::OptimizerState, ::AbstractVector)
update!(alg::OptimizerState, ::AbstractVector)
solver_step!(::AbstractVector, alg::OptimizerState)

that initialize and update the state of the algorithm and perform an actual optimization step.

Further the following convenience methods should be implemented,

problem(alg::OptimizerState)
gradient(alg::OptimizerState)
hessian(alg::OptimizerState)
linesearch(alg::OptimizerState)

which return the problem to optimize, its gradient and (approximate) Hessian as well as the linesearch algorithm used in conjunction with the optimization algorithm if any.

See NewtonOptimizerState for a struct that was derived from OptimizerState.

Info

Note that a OptimizerState is not necessarily a NewtonOptimizerState as we can also have other optimizers, Adam for example.

source

GeometricOptimizers.OptimizerStatus — Type

OptimizerStatus

Contains residuals (relative and absolute) and various convergence properties.

This is also used in OptimizerResult.

Examples

x = ones(3)
state = NewtonOptimizerState(x)
cache = NewtonOptimizerCache(x)
f = 1.
config = Options()
OptimizerStatus(state, cache, f; config = config)

# output

 * Convergence measures

    |x - x'|               = NaN
    |x - x'|/|x'|          = NaN
    |f(x) - f(x')|         = NaN
    |f(x) - f(x')|/|f(x')| = NaN
    |g(x) - g(x')|         = NaN
    |g(x)|                 = NaN

source

GeometricOptimizers.QuasiNewtonOptimizerMethod — Type

QuasiNewtonOptimizerMethod <: EuclideanOptimizerMethod

Includes _BFGS and _DFP.

source

GeometricOptimizers.SkewSymMatrix — Type

SkewSymMatrix(S::AbstractVector, n::Integer)

Instantiate a skew-symmetric matrix with information stored in vector S.

A skew-symmetric matrix $A$ is a matrix $A^T = -A$.

Internally the struct saves a vector $S$ of size $n(n-1)\div2$. The conversion is done the following way:

\[[A]_{ij} = \begin{cases} 0 & \text{if $i=j$} \\ S[( (i-2) (i-1) ) \div 2 + j] & \text{if $i>j$}\\ S[( (j-2) (j-1) ) \div 2 + i] & \text{else}. \end{cases}\]

So $S$ stores a string of vectors taken from $A$: $S = [\tilde{a}_1, \tilde{a}_2, \ldots, \tilde{a}_n]$ with $\tilde{a}_i = [[A]_{i1},[A]_{i2},\ldots,[A]_{i(i-1)}]$.

Also see SymmetricMatrix, LowerTriangular and UpperTriangular.

Examples

using GeometricOptimizers
S = [1, 2, 3, 4, 5, 6]
SkewSymMatrix(S, 4)

# output

4×4 SkewSymMatrix{Int64, Vector{Int64}}:
 0  -1  -2  -4
 1   0  -3  -5
 2   3   0  -6
 4   5   6   0

source

GeometricOptimizers.SkewSymMatrix — Method

SkewSymMatrix(A::AbstractMatrix)

Perform 0.5 * (A - A') and store the matrix in an efficient way (as a vector with $n(n-1)/2$ entries).

If the constructor is called with a matrix as input it returns a skew-symmetric matrix via the projection:

\[A \mapsto \frac{1}{2}(A - A^T).\]

Examples

using GeometricOptimizers
M = [1 2 3 4; 5 6 7 8; 9 10 11 12; 13 14 15 16]
SkewSymMatrix(M)

# output

4×4 SkewSymMatrix{Float64, Vector{Float64}}:
 0.0  -1.5  -3.0  -4.5
 1.5   0.0  -1.5  -3.0
 3.0   1.5   0.0  -1.5
 4.5   3.0   1.5   0.0

Extended help

Note that the constructor is designed in such a way that it always returns matrices of type SkewSymMatrix{<:AbstractFloat} when called with a matrix, even if this matrix is of type AbstractMatrix{<:Integer}.

If the user wishes to allocate a matrix SkewSymMatrix{<:Integer} then call:

SkewSymMatrix(::AbstractVector, n::Integer)

Note that this is different from LowerTriangular and UpperTriangular as no porjection takes place there.

source

GeometricOptimizers.StiefelLieAlgHorMatrix — Type

StiefelLieAlgHorMatrix(A::SkewSymMatrix, B::AbstractMatrix, N::Integer, n::Integer)

Build an instance of StiefelLieAlgHorMatrix based on a skew-symmetric matrix A and an arbitrary matrix B.

An element of StiefelLieAlgMatrix takes the form:

\[\begin{pmatrix} A & B^T \\ B & \mathbb{O} \end{pmatrix},\]

where $A$ is skew-symmetric (this is SkewSymMatrix in GeometricOptimizers).

Also see GrassmannLieAlgHorMatrix.

Extended help

StiefelLieAlgHorMatrix is the horizontal component of the Lie algebra of skew-symmetric matrices (with respect to the canonical metric).

The projection here is: $\pi:S \to SE$ where

\[E = \begin{bmatrix} \mathbb{I}_{n} \\ \mathbb{O}_{(N-n)\times{}n} \end{bmatrix}.\]

The matrix $E$ is implemented under StiefelProjection in GeometricOptimizers.

source

GeometricOptimizers.StiefelLieAlgHorMatrix — Method

StiefelLieAlgHorMatrix(D::AbstractMatrix, n::Integer)

Take a big matrix as input and build an instance of StiefelLieAlgHorMatrix.

The integer $N$ in $St(n, N)$ is the number of rows of D.

Extended help

If the constructor is called with a big $N\times{}N$ matrix, then the projection is performed the following way:

\[\begin{pmatrix} A & B_1 \\ B_2 & D \end{pmatrix} \mapsto \begin{pmatrix} \mathrm{skew}(A) & -B_2^T \\ B_2 & \mathbb{O} \end{pmatrix}.\]

The operation $\mathrm{skew}:\mathbb{R}^{n\times{}n}\to\mathcal{S}_\mathrm{skew}(n)$ is the skew-symmetrization operation. This is equivalent to calling of SkewSymMatrix with an $n\times{}n$ matrix.

This can also be seen as the operation:

\[D \mapsto \Omega(E, DE) = \mathrm{skew}\left(2 \left(\mathbb{I} - \frac{1}{2} E E^T \right) DE E^T\right).\]

Also see GeometricOptimizers.Ω.

source

GeometricOptimizers.StiefelManifold — Type

StiefelManifold <: Manifold

An implementation of the Stiefel manifold [2]. The Stiefel manifold is the collection of all matrices $Y\in\mathbb{R}^{N\times{}n}$ whose columns are orthonormal, i.e.

\[ St(n, N) = \{Y: Y^TY = \mathbb{I}_n \}.\]

The Stiefel manifold can be shown to have manifold structure (as the name suggests) and this is heavily used in GeometricOptimizers. It is further a compact space. More information can be found in the docstrings for rgrad(::StiefelManifold, ::AbstractMatrix) and metric(::StiefelManifold, ::AbstractMatrix, ::AbstractMatrix).

source

GeometricOptimizers.StiefelProjection — Type

StiefelProjection(backend, T, N, n)

Make a matrix of the form $\begin{bmatrix} \mathbb{I} & \mathbb{O} \end{bmatrix}^T$ for a specific backend and data type.

An array that essentially does vcat(I(n), zeros(N-n, n)) with GPU support.

Extended help

An instance of StiefelProjection should technically also belong to StiefelManifold.

source

GeometricOptimizers.StiefelProjection — Method

StiefelProjection(A::AbstractMatrix)

Extract necessary information from A and build an instance of StiefelProjection.

Necessary information here referes to the backend, the data type and the size of the matrix.

source

GeometricOptimizers.StiefelProjection — Method

StiefelProjection(B::AbstractLieAlgHorMatrix)

Extract necessary information from B and build an instance of StiefelProjection.

Necessary information here referes to the backend, the data type and the size of the matrix.

The size is queried through B.N and B.n.

Examples

using GeometricOptimizers
using GeometricOptimizers: StiefelProjection

B₁ = rand(StiefelLieAlgHorMatrix, 5, 2)
B₂ = rand(GrassmannLieAlgHorMatrix, 5, 2)
E = [1. 0.; 0. 1.; 0. 0.; 0. 0.; 0. 0.]

StiefelProjection(B₁) ≈ StiefelProjection(B₂) ≈ E 

# output

true

source

GeometricOptimizers.SymmetricMatrix — Type

SymmetricMatrix(S::AbstractVector, n::Integer)

Instantiate a symmetric matrix with information stored in vector S.

A SymmetricMatrix $A$ is a matrix $A^T = A$.

Internally the struct saves a vector $S$ of size $n(n+1)\div2$. The conversion is done the following way:

\[[A]_{ij} = \begin{cases} S[( (i-1) i ) \div 2 + j] & \text{if $i\geq{}j$}\\ S[( (j-1) j ) \div 2 + i] & \text{else}. \end{cases}\]

So $S$ stores a string of vectors taken from $A$: $S = [\tilde{a}_1, \tilde{a}_2, \ldots, \tilde{a}_n]$ with $\tilde{a}_i = [[A]_{i1},[A]_{i2},\ldots,[A]_{ii}]$.

Also see SkewSymMatrix, LowerTriangular and UpperTriangular.

Examples

using GeometricOptimizers
S = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
SymmetricMatrix(S, 4)

# output

4×4 SymmetricMatrix{Int64, Vector{Int64}}:
 1  2  4   7
 2  3  5   8
 4  5  6   9
 7  8  9  10

source

GeometricOptimizers.SymmetricMatrix — Method

SymmetricMatrix(A::AbstractMatrix)

Perform a projection and store the matrix in an efficient way (as a vector with $n(n+1)/2$ entries).

If the constructor is called with a matrix as input it returns a symmetric matrix via the projection:

\[A \mapsto \frac{1}{2}(A + A^T).\]

Examples

using GeometricOptimizers
M = [1 2 3 4; 5 6 7 8; 9 10 11 12; 13 14 15 16]
SymmetricMatrix(M)

# output

4×4 SymmetricMatrix{Float64, Vector{Float64}}:
 1.0   3.5   6.0   8.5
 3.5   6.0   8.5  11.0
 6.0   8.5  11.0  13.5
 8.5  11.0  13.5  16.0

Extended help

Note that the constructor is designed in such a way that it always returns matrices of type SymmetricMatrix{<:AbstractFloat} when called with a matrix, even if this matrix is of type AbstractMatrix{<:Integer}.

If the user wishes to allocate a matrix SymmetricMatrix{<:Integer} then call

$julia SymmetricMatrix(::AbstractVector, n::Integer)$`

Note that this is different from LowerTriangular and UpperTriangular as no porjection takes place there.

source

GeometricOptimizers.UpperTriangular — Type

UpperTriangular(S::AbstractVector, n::Int)

Build an upper-triangular matrix from a vector.

An upper-triangular matrix is an $n\times{}n$ matrix that has zeros on the diagonal and on the lower triangular.

The data are stored in a vector $S$ similarly to other matrices. See LowerTriangular, SkewSymMatrix and SymmetricMatrix.

The struct two fields: S and n. The first stores all the entries of the matrix in a sparse fashion (in a vector) and the second is the dimension $n$ for $A\in\mathbb{R}^{n\times{}n}$.

Examples

using GeometricOptimizers
S = [1, 2, 3, 4, 5, 6]
UpperTriangular(S, 4)

# output

4×4 UpperTriangular{Int64, Vector{Int64}}:
 0  1  2  4
 0  0  3  5
 0  0  0  6
 0  0  0  0

source

GeometricOptimizers.UpperTriangular — Method

UpperTriangular(A::AbstractMatrix)

Build an upper-triangular matrix from a matrix.

This is done by taking the upper right of that matrix.

Examples

using GeometricOptimizers
M = [1 2 3 4; 5 6 7 8; 9 10 11 12; 13 14 15 16]
UpperTriangular(M)

# output

4×4 UpperTriangular{Int64, Vector{Int64}}:
 0  2  3   4
 0  0  7   8
 0  0  0  12
 0  0  0   0

source

GeometricOptimizers._BFGS — Type

Algorithm taken from [3].

source

GeometricOptimizers._BFGSCache — Type

BFGSCache(B)

Make the cache for the BFGS optimizer based on the array B.

It stores an array for the gradient of the previous time step B and the inverse of the Hessian matrix H.

The cache for the inverse of the Hessian is initialized with the idendity. The cache for the previous gradient information is initialized with the zero vector.

Note that the cache for H is changed iteratively, whereas the cache for B is newly assigned at every time step.

source

GeometricOptimizers._DFP — Type

Algorithm taken from [3].

source

GeometricOptimizers._Gradient — Type

Gradient(η)

Make an instance of a gradient optimizer.

This is the simplest neural network optimizer. It has no cache and computes the final velocity as:

\[ \mathrm{velocity} \gets - \eta\nabla_\mathrm{weight}L.\]

Implementation

The operations are done as memory efficiently as possible. This means the provided $\nabla_WL$ is mutated via:

rmul!(∇L, -method.η)

source

Base.:* — Method

λY * Y

Apply the element λY onto Y.

Here λY is an element of a Lie group and Y is an element of a homogeneous space.

source

Base.rand — Method

rand(backend, manifold_type, N, n)

Draw random elements for a specific device.

Examples

Random elements of the manifold can be allocated on GPU. Call ...

rand(CUDABackend(), StiefelManifold{Float32}, N, n)

... for drawing elements on a CUDA device.

source

Base.rand — Method

rand(manifold_type, N, n)

Draw random elements from the Stiefel and the Grassmann manifold.

Because both of these manifolds are compact spaces we can sample them uniformly [4].

Examples

When we call ...

using GeometricOptimizers
using GeometricOptimizers: _round # hide
import Random
Random.seed!(123)

N, n = 5, 3
Y = rand(StiefelManifold{Float32}, N, n)
_round(Y; digits = 5) # hide

# output

5×3 StiefelManifold{Float32, Matrix{Float32}}:
 -0.27575   0.32991   0.77275
 -0.62485  -0.33224  -0.0686
 -0.69333   0.36724  -0.18988
 -0.09295  -0.73145   0.46064
  0.2102    0.33301   0.38717

... the sampling is done by first allocating a random matrix of size $N\times{}n$ via Y = randn(Float32, N, n).

We then perform a QR decomposition Q, R = qr(Y) with the qr function from the LinearAlgebra package (this is using Householder reflections internally).

The final output are then the first n columns of the Q matrix.

source

Base.vec — Method

vec(A::AbstractTriangular)

Return the associated vector to $A$.

Examples

using GeometricOptimizers

M = [1 2 3 4; 5 6 7 8; 9 10 11 12; 13 14 15 16]
LowerTriangular(M) |> vec

# output

6-element Vector{Int64}:
  5
  9
 10
 13
 14
 15

source

Base.vec — Method

vec(A)

Output the associated vector of A.

Examples

using GeometricOptimizers

M = [1 2 3 4; 5 6 7 8; 9 10 11 12; 13 14 15 16]
SkewSymMatrix(M) |> vec

# output

6-element Vector{Float64}:
 1.5
 3.0
 1.5
 4.5
 3.0
 1.5

source

Base.vec — Method

vec(A::StiefelLieAlgHorMatrix)

Vectorize A.

Examples

using GeometricOptimizers

A = SkewSymMatrix([1, ], 2)
B = [2 3; ]
B̄ = StiefelLieAlgHorMatrix(A, B, 3, 2)
B̄ |> vec

# output

vcat(1-element Vector{Int64}, 2-element Vector{Int64}):
 1
 2
 3

Implementation

This is using Vcat from the package LazyArrays.

source

GeometricBase.update! — Method

update!(opt, x)

Update the cache contained in the optimizer opt.See e.g. update!(::NewtonOptimizerCache, ::OptimizerState, ::Gradient, ::Hessian, ::AbstractVector).

source

GeometricBase.update! — Method

update!(cache, state, x, g)

Update the NewtonOptimizerCache based on x and g.

source

GeometricBase.update! — Method

update!(cache::NewtonOptimizerCache, x, g, hes)

Update an instance of NewtonOptimizerCache based on x.

This is used in update!(::OptimizerState, ::AbstractVector).

This sets:

\[\begin{aligned} % \bar{x}^\mathtt{cache} & \gets x, \\ x^\mathtt{cache} & \gets x, \\ g^\mathtt{cache} & \gets g, \\ \mathrm{rhs}^\mathtt{cache} & \gets -g, \\ H^\mathtt{cache} & \gets H(x), \\ \delta^\mathtt{cache} & \gets (H^\mathtt{cache})^{-1}\mathrm{rhs}^\mathtt{cache}, \end{aligned}\]

where we wrote $H$ for the Hessian (i.e. the input argument hes).

source

GeometricBase.update! — Method

update!(state::NewtonOptimizerState, gradient, x)

Update an instance of NewtonOptimizerState based on x and gradient, where g is of type SimpleSolvers.Gradient.

Examples

If we only call update! once there are still NaNs for x̄, ḡ and f̄.

f(x) = sum(x.^2)
x = [1., 2.]
state = NewtonOptimizerState(x)
grad = GradientAutodiff{Float64}(f, length(x))
update!(state, grad, x)

# output

NewtonOptimizerState{Float64, Vector{Float64}, Vector{Float64}}(0, [1.0, 2.0], [NaN, NaN], [2.0, 4.0], [NaN, NaN], 5.0, NaN)

source

GeometricBase.update! — Method

update!(o, cache, B)

Update the cache and output a final velocity that is stored in B.

Note that $B\in\mathfrak{g}^\mathrm{hor}$ in general.

In the manifold case the final velocity is the input to a retraction.

source

GeometricBase.update! — Method

update!(o::Optimizer{<:BFGS}, C, B)

Peform an update with the BFGS optimizer.

C is the cache, B contains the gradient information (the output of global_rep in general).

First we compute the final velocity with

vecS = -o.method.η * C.H * vec(B)

and then we update H

C.H .= (𝕀 - ρ * SY) * C.H * (𝕀 - ρ * SY') + ρ * vecS * vecS'

where SY is vecS * Y' and 𝕀 is the idendity.

Implementation

For stability we use δ for computing ρ:

ρ = 1. / (vecS' * Y + o.method.δ)

This is similar to the Adam

Extended help

If we have weights on a Manifold than the updates are slightly more difficult. In this case the vec operation has to be generalized to the corresponding global tangent space.

source

GeometricBase.update! — Method

update!(cache, x, g)

Update the BFGSCache based on x and g.

Extended help

The update rule used here can be found in [5] and [3]:

It does:

\[\begin{aligned} \delta & \gets x^{(k)} - x^{(k-1)}, \\ \gamma & \gets \nabla{}f^{(k)} - \nabla{}f^{(k-1)}, \\ T_1 & \gets \delta\gamma^TQ, \\ T_2 & \gets Q\gamma\delta^T, \\ T_3 & \gets (1 + \frac{\gamma^TQ\gamma}{\delta^\gamma})\delta\delta^T,\\ Q & \gets Q - (T_1 + T_2 - T_3)/{\delta^T\gamma} \end{aligned}\]

source

GeometricBase.update! — Method

update!(cache, x, g)

Update the DFPCache based on x and g.

Extended help

The update rule used here can be found in [5] and [3].

source

GeometricOptimizers.apply_section! — Method

apply_section!(Y::AT, λY::GlobalSection{T, AT}, Y₂::AT) where {T, AT<:StiefelManifold{T}}

Apply λY to Y₂ and store the result in Y.

This is the inplace version of apply_section.

source

GeometricOptimizers.apply_section — Method

apply_section(λY::GlobalSection{T, AT}, Y₂::AT) where {T, AT <: StiefelManifold{T}}

Apply λY to Y₂.

Mathematically this is the group action of the element $\lambda{}Y\in{}G$ on the element $Y_2$ of the homogeneous space $\mathcal{M}$.

Internally it calls apply_section!.

source

GeometricOptimizers.cayley — Method

cayley(B̄::GrassmannLieAlgHorMatrix)

Compute the Cayley retraction of B.

This is equivalent to the method of cayley for StiefelLieAlgHorMatrix.

See cayley(::StiefelLieAlgHorMatrix).

source

GeometricOptimizers.cayley — Method

cayley(B̄::StiefelLieAlgHorMatrix)

Compute the Cayley retraction of B.

Implementation

Internally this is using

\[\mathrm{Cayley}(\bar{B}) = \mathbb{I} + \frac{1}{2} B' (\mathbb{I}_{2n} - \frac{1}{2} (B'')^T B')^{-1} (B'')^T (\mathbb{I} + \frac{1}{2} B),\]

with

\[\bar{B} = \begin{bmatrix} A & -B^T \\ B & \mathbb{O} \end{bmatrix} = \begin{bmatrix} \frac{1}{2}A & \mathbb{I} \\ B & \mathbb{O} \end{bmatrix} \begin{bmatrix} \mathbb{I} & \mathbb{O} \\ \frac{1}{2}A & -B^T \end{bmatrix} =: B'(B'')^T,\]

i.e. $\bar{B}$ is expressed as a product of two $N\times{}2n$ matrices.

source

GeometricOptimizers.cayley — Method

cayley(Y::Manifold, Δ)

Take as input an element of a manifold Y and a tangent vector in Δ in the corresponding tangent space and compute the Cayley retraction.

In different notation: take as input an element $x$ of $\mathcal{M}$ and an element of $T_x\mathcal{M}$ and return $\mathrm{Cayley}(v_x).$

Examples

using GeometricOptimizers

Y = StiefelManifold([1. 0. 0.;]' |> Matrix)
Δ = [0. .5 0.;]' |> Matrix
Y₂ = GeometricOptimizers.cayley(Y, Δ)

Y₂' * Y₂ ≈ [1.;]

# output

true

See the example in [geodesic(::Manifold{T}, ::AbstractMatrix{T}) where T].

source

GeometricOptimizers.convergence_measures — Method

convergence_measures(status, config)

Checks if the optimizer converged.

Here status is an OptimizerStatus object and config is an SimpleSolvers.Options object.

source

GeometricOptimizers.geodesic — Method

geodesic(B̄::GrassmannLieAlgHorMatrix)

Compute the geodesic of an element in GrassmannLieAlgHorMatrix.

This is equivalent to the method of geodesic for StiefelLieAlgHorMatrix.

See geodesic(::StiefelLieAlgHorMatrix).

source

GeometricOptimizers.geodesic — Method

geodesic(B̄::StiefelLieAlgHorMatrix)

Compute the geodesic of an element in StiefelLieAlgHorMatrix.

Implementation

Internally this is using:

\[\mathbb{I} + B'\mathfrak{A}(B', B'')B'',\]

with

This is using a computationally efficient version of the matrix exponential $\mathfrak{A}$.

See GeometricOptimizers.𝔄.

source

GeometricOptimizers.geodesic — Method

geodesic(Y::Manifold, Δ)

Take as input an element of a manifold Y and a tangent vector in Δ in the corresponding tangent space and compute the geodesic (exponential map).

In different notation: take as input an element $x$ of $\mathcal{M}$ and an element of $T_x\mathcal{M}$ and return $\mathtt{geodesic}(x, v_x) = \exp(v_x).$

Examples

using GeometricOptimizers

Y = StiefelManifold([1. 0. 0.;]' |> Matrix)
Δ = [0. .5 0.;]' |> Matrix
Y₂ = GeometricOptimizers.geodesic(Y, Δ)

Y₂' * Y₂ ≈ [1.;]

# output

true

Implementation

Internally this geodesic method calls geodesic(::StiefelLieAlgHorMatrix).

source

GeometricOptimizers.global_rep — Method

global_rep(λY::GlobalSection{T, AT}, Δ::AbstractMatrix{T}) where {T, AT<:GrassmannManifold{T}}

Express Δ (an element of the tangent space of Y) as an instance of GrassmannLieAlgHorMatrix.

The method global_rep for GrassmannManifold is similar to that for StiefelManifold.

Examples

using GeometricOptimizers
using GeometricOptimizers: _round
import Random 

Random.seed!(123)

Y = rand(GrassmannManifold, 6, 3)
Δ = rgrad(Y, randn(6, 3))
λY = GlobalSection(Y)

_round(global_rep(λY, Δ); digits = 3)

# output

6×6 GrassmannLieAlgHorMatrix{Float64, Matrix{Float64}}:
  0.0     0.0     0.0     0.981  -2.058   0.4
  0.0     0.0     0.0    -0.424   0.733  -0.919
  0.0     0.0     0.0    -1.815   1.409   1.085
 -0.981   0.424   1.815   0.0     0.0     0.0
  2.058  -0.733  -1.409   0.0     0.0     0.0
 -0.4     0.919  -1.085   0.0     0.0     0.0

source

GeometricOptimizers.global_rep — Method

global_rep(λY::GlobalSection{T, AT}, Δ::AbstractMatrix{T}) where {T, AT<:StiefelManifold{T}}

Express Δ (an the tangent space of Y) as an instance of StiefelLieAlgHorMatrix.

This maps an element from $T_Y\mathcal{M}$ to an element of $\mathfrak{g}^\mathrm{hor}$.

These two spaces are isomorphic where the isomorphism where the isomorphism is established through $\lambda(Y)\in{}G$ via:

\[T_Y\mathcal{M} \to \mathfrak{g}^{\mathrm{hor}}, \Delta \mapsto \lambda(Y)^{-1}\Omega(Y, \Delta)\lambda(Y).\]

Also see GeometricOptimizers.Ω.

Examples

using GeometricOptimizers
using GeometricOptimizers: _round
import Random 

Random.seed!(123)

Y = rand(StiefelManifold, 6, 3)
Δ = rgrad(Y, randn(6, 3))
λY = GlobalSection(Y)

_round(global_rep(λY, Δ); digits = 3)

# output

6×6 StiefelLieAlgHorMatrix{Float64, SkewSymMatrix{Float64, Vector{Float64}}, Matrix{Float64}}:
  0.0     0.679   1.925   0.981  -2.058   0.4
 -0.679   0.0     0.298  -0.424   0.733  -0.919
 -1.925  -0.298   0.0    -1.815   1.409   1.085
 -0.981   0.424   1.815   0.0     0.0     0.0
  2.058  -0.733  -1.409   0.0     0.0     0.0
 -0.4     0.919  -1.085   0.0     0.0     0.0

Implementation

The function global_rep does in fact not perform the entire map $\lambda(Y)^{-1}\Omega(Y, \Delta)\lambda(Y)$ but only

\[\Delta \mapsto \mathrm{skew}(Y^T\Delta),\]

to get the small skew-symmetric matrix $A\in\mathcal{S}_\mathrm{skew}(n)$ and

\[\Delta \mapsto (\lambda(Y)_{[1:N, n:N]}^T \Delta)_{[1:(N-n), 1:n]},\]

to get the arbitrary matrix $B\in\mathbb{R}^{(N-n)\times{}n}$.

source

GeometricOptimizers.global_section — Method

global_section(Y::GrassmannManifold)

Compute a matrix of size $N\times(N-n)$ whose columns are orthogonal to the columns in Y.

The method global_section for the Grassmann manifold is equivalent to that for the StiefelManifold (we represent the Grassmann manifold as an embedding in the Stiefel manifold).

See the documentation for global_section(Y::StiefelManifold{T}) where T.

source

GeometricOptimizers.global_section — Method

global_section(Y::StiefelManifold)

Compute a matrix of size $N\times(N-n)$ whose columns are orthogonal to the columns in Y.

This matrix is also called $Y_\perp$ [6–8].

Examples

using GeometricOptimizers
using GeometricOptimizers: global_section
import Random

Random.seed!(123)

Y = StiefelManifold([1. 0.; 0. 1.; 0. 0.; 0. 0.])

round.(Matrix(global_section(Y)); digits = 3)

# output

4×2 Matrix{Float64}:
 0.0    -0.0
 0.0     0.0
 0.936  -0.353
 0.353   0.936

Further note that we convert the QRCompactWYQ object to a Matrix before we display it.

Implementation

The implementation is done with a QR decomposition (LinearAlgebra.qr!). Internally we do:

A = randn(N, N - n) # or the gpu equivalent
A = A - Y.A * (Y.A' * A)
qr!(A).Q

source

GeometricOptimizers.gradient — Method

gradient(cache)

Return the stored gradient (array) of an instance of BFGSCache

source

GeometricOptimizers.gradient — Method

gradient(cache)

Return the stored gradient (array) of an instance of DFPCache

source

GeometricOptimizers.gradient — Method

gradient(::NewtonOptimizerCache)

Return the stored gradient (array) of an instance of NewtonOptimizerCache

source

GeometricOptimizers.init_optimizer_cache — Method

init_optimizer_cache(method, x)

Initialize the optimizer cache based on input x for the given method.

source

GeometricOptimizers.isaOptimizerState — Method

isaOptimizerState(alg)

Verify if an object implements the OptimizerState interface.

source

GeometricOptimizers.meets_stopping_criteria — Method

meets_stopping_criteria(status, config, iterations)

Check if the optimizer has converged.

Implementation

meets_stopping_criteria checks if one of the following is true:

converged (the output of SimpleSolvers.assess_convergence) is true and iterations $\geq$ config.min_iterations,
if config.allow_f_increases is false: status.f_increased is true,
iterations $\geq$ config.max_iterations,
status.rxₐ $>$ config.x_abstol_break
status.rxᵣ $>$ config.x_reltol_break
status.rfₐ $>$ config.f_abstol_break
status.rfᵣ $>$ config.f_reltol_break
status.rg $>$ config.g_restol_break
status.x_isnan
status.f_isnan
status.g_isnan

source

GeometricOptimizers.metric — Method

metric(Y::GrassmannManifold, Δ₁::AbstractMatrix, Δ₂::AbstractMatrix)

Compute the metric for vectors Δ₁ and Δ₂ at Y.

The representation of the Grassmann manifold is realized as a quotient space of the Stiefel manifold.

The metric for the Grassmann manifold is:

\[g^{Gr}_Y(\Delta_1, \Delta_2) = g^{St}_Y(\Delta_1, \Delta_2) = \mathrm{Tr}(\Delta_1^T (\mathbb{I} - Y Y^T) \Delta_2) = \mathrm{Tr}(\Delta_1^T \Delta_2),\]

where we used that $Y^T\Delta_i$ for $i = 1, 2.$

source

GeometricOptimizers.metric — Method

metric(Y::StiefelManifold, Δ₁::AbstractMatrix, Δ₂::AbstractMatrix)

Compute the dot product for Δ₁ and Δ₂ at Y.

This uses the canonical Riemannian metric for the Stiefel manifold:

\[g_Y: (\Delta_1, \Delta_2) \mapsto \mathrm{Tr}(\Delta_1^T(\mathbb{I} - \frac{1}{2}YY^T)\Delta_2).\]

source

GeometricOptimizers.optimization_step! — Method

optimization_step!(o, λY, ps, dx)

Update the weights ps based on an EuclideanOptimizer, a cache and first-order derivatives dx.

optimization_step! is calling update! internally. update! has to be implemented for every OptimizerMethod.

Arguments

All arguments into optimization_step! are mandatory:

o::Optimizer,
λY::NamedTuple: this named tuple has the same keys as ps, but contains GlobalSections,
ps::NamedTuple: the neural network parameters,
dx::NamedTuple: the gradients stores as a NamedTuple.

All the arguments are given as NamedTuples as the neural network weights are stores in that format.

using GeometricOptimizers
using GeometricOptimizers: MomentumCache, Momentum, apply_toNT, geodesic, optimization_step!

ps = (weight = rand(StiefelManifold{Float32}, 5, 3), )
cache = apply_toNT(MomentumCache, ps)
o = Optimizer(Momentum(), cache, 0, geodesic)
λY = GlobalSection(ps)
dx = (weight = rand(Float32, 5, 3), )

# call the optimizer
optimization_step!(o, λY, ps, dx)

_test_nt(x) = typeof(x) <: NamedTuple

_test_nt(λY) & _test_nt(ps) & _test_nt(cache) & _test_nt(dx)

# output

true

Extended help

The derivatives dx here are usually obtained via an AD routine by differentiating a loss function, i.e. dx is $\nabla_xL$.

source

GeometricOptimizers.rgrad — Method

rgrad(Y::GrassmannManifold, ∇L::AbstractMatrix)

Compute the Riemannian gradient for the Grassmann manifold at Y based on ∇L.

Here $Y$ is a representation of $\mathrm{span}(Y)\in{}Gr(n, N)$ and $\nabla{}L\in\mathbb{R}^{N\times{}n}$ is the Euclidean gradient.

This gradient has the property that it is orthogonal to the space spanned by $Y$.

The precise form of the mapping is:

\[\mathtt{rgrad}(Y, \nabla{}L) \mapsto \nabla{}L - YY^T\nabla{}L.\]

Note the property $Y^T\mathrm{rgrad}(Y, \nabla{}L) = \mathbb{O}.$

Also see rgrad(::StiefelManifold, ::AbstractMatrix).

Examples

using GeometricOptimizers

Y = GrassmannManifold([1 0 ; 0 1 ; 0 0; 0 0])
Δ = [1 2; 3 4; 5 6; 7 8]
rgrad(Y, Δ)

# output

4×2 Matrix{Int64}:
 0  0
 0  0
 5  6
 7  8

source

GeometricOptimizers.rgrad — Method

rgrad(Y::StiefelManifold, ∇L::AbstractMatrix)

Compute the Riemannian gradient for the Stiefel manifold at Y based on ∇L.

Here $Y\in{}St(N,n)$ and $\nabla{}L\in\mathbb{R}^{N\times{}n}$ is the Euclidean gradient.

The function computes the Riemannian gradient with respect to the canonical metric: metric(::StiefelManifold, ::AbstractMatrix, ::AbstractMatrix).

The precise form of the mapping is:

\[\mathtt{rgrad}(Y, \nabla{}L) \mapsto \nabla{}L - Y(\nabla{}L)^TY\]

Note the property $Y^T\mathtt{rgrad}(Y, \nabla{}L)\in\mathcal{S}_\mathrm{skew}(n).$

Examples

using GeometricOptimizers

Y = StiefelManifold([1 0 ; 0 1 ; 0 0; 0 0])
Δ = [1 2; 3 4; 5 6; 7 8]
rgrad(Y, Δ)

# output

4×2 Matrix{Int64}:
 0  -1
 1   0
 5   6
 7   8

source

GeometricOptimizers.rhs — Method

rhs(cache)

Return the right hand side of an instance of BFGSCache

source

GeometricOptimizers.rhs — Method

rhs(cache)

Return the right hand side of an instance of DFPCache

source

GeometricOptimizers.rhs — Method

rhs(cache)

Return the right hand side of an instance of NewtonOptimizerCache

source

GeometricOptimizers.solver_step! — Method

solver_step!(x, state, opt)

Compute a full iterate for an EuclideanOptimizer.

Info

This also performs a line search.

Examples

julia> f(x) = sum(x .^ 2 + x .^ 3 / 3);

julia> x = [1f0, 2f0]
2-element Vector{Float32}:
 1.0
 2.0

julia> opt = EuclideanOptimizer(x, f; algorithm = Newton());

julia> state = NewtonOptimizerState(x);

julia> update!(state, gradient(opt), x);

julia> solver_step!(x, state, opt)
2-element Vector{Float32}:
 0.25
 0.6666666

source

GeometricOptimizers.value — Method

value(obj::AbstractOptimizerProblem, x)

Evaluates the value at x (i.e. computes obj.F(x)).

source

GeometricOptimizers.Ω — Method

Ω(Y::GrassmannManifold{T}, Δ::AbstractMatrix{T}) where T

Perform the canonical horizontal lift for the Grassmann manifold:

\[ \Delta \mapsto \Omega^{St}(\Delta),\]

where $\Omega^{St}$ is the canonical horizontal lift for the Stiefel manifold.

using GeometricOptimizers
using GeometricOptimizers: StiefelProjection

E = GrassmannManifold(StiefelProjection(5, 2))
Δ = [0. 0.; 0. 0.; 2. 3.; 4. 5.; 6. 7.]
GeometricOptimizers.Ω(E, Δ)

# output

5×5 SkewSymMatrix{Float64, Vector{Float64}}:
 0.0  -0.0  -2.0  -4.0  -6.0
 0.0   0.0  -3.0  -5.0  -7.0
 2.0   3.0   0.0  -0.0  -0.0
 4.0   5.0   0.0   0.0  -0.0
 6.0   7.0   0.0   0.0   0.0

source

GeometricOptimizers.Ω — Method

Ω(Y::StiefelManifold{T}, Δ::AbstractMatrix{T}) where T

Perform canonical horizontal lift for the Stiefel manifold:

\[ \Delta \mapsto (\mathbb{I} - \frac{1}{2}YY^T)\Delta{}Y^T - Y\Delta^T(\mathbb{I} - \frac{1}{2}YY^T).\]

Internally this performs

SkewSymMatrix(2 * (I(n) - .5 * Y * Y') * Δ * Y')

It uses SkewSymMatrix to save memory.

Examples

using GeometricOptimizers
using GeometricOptimizers: StiefelProjection

E = StiefelManifold(StiefelProjection(5, 2))
Δ = [0. -1.; 1. 0.; 2. 3.; 4. 5.; 6. 7.]
GeometricOptimizers.Ω(E, Δ)

# output

5×5 SkewSymMatrix{Float64, Vector{Float64}}:
 0.0  -1.0  -2.0  -4.0  -6.0
 1.0   0.0  -3.0  -5.0  -7.0
 2.0   3.0   0.0  -0.0  -0.0
 4.0   5.0   0.0   0.0  -0.0
 6.0   7.0   0.0   0.0   0.0

Note that the output of Ω is a skew-symmetric matrix, i.e. an element of $\mathfrak{g}$.

source

GeometricOptimizers.𝔄 — Method

𝔄(B̂, B̄)

Compute $\mathfrak{A}(B', B'') := \sum_{n=1}^\infty \frac{1}{n!} ((B'')^TB')^{n-1}.$

This expression has the property $\mathbb{I} + B'\mathfrak{A}(B', B'')(B'')^T = \exp(B'(B'')^T).$

Examples

using GeometricOptimizers
using GeometricOptimizers: 𝔄
import Random
Random.seed!(123)

B = rand(StiefelLieAlgHorMatrix, 10, 2)
B̂ = hcat(vcat(.5 * B.A, B.B), vcat(one(B.A), zero(B.B)))
B̄ = hcat(vcat(one(B.A), zero(B.B)), vcat(-.5 * B.A, -B.B))

one(B̂ * B̄') + B̂ * 𝔄(B̂, B̄) * B̄' ≈ exp(Matrix(B))

# output

true

source

GeometricOptimizers.𝔄 — Method

𝔄(A)

Compute $\mathfrak{A}(A) := \sum_{n=1}^\infty \frac{1}{n!} (A)^{n-1}.$

Implementation

This uses a Taylor expansion that iteratively adds terms with

while norm(Aⁿ) > ε
mul!(A_temp, Aⁿ, A)
Aⁿ .= A_temp
rmul!(Aⁿ, T(inv(n)))

𝔄A += Aⁿ
n += 1 
end

until the norm of Aⁿ becomes smaller than machine precision. The counter n in the above algorithm is initialized as 2 The matrices Aⁿ and 𝔄 are initialized as the identity matrix.

source

SimpleSolvers.direction — Method

direction(cache)

Return the direction of the gradient step (i.e. Δx) of an instance of BFGSCache.

source

SimpleSolvers.direction — Method

direction(cache)

Return the direction of the gradient step (i.e. Δx) of an instance of DFPCache.

source

SimpleSolvers.direction — Method

direction(cache)

Return the direction of the gradient step (i.e. Δx) of an instance of NewtonOptimizerCache.

source

SimpleSolvers.linesearch_problem — Method

linesearch_problem(problem, cache)

Create SimpleSolvers.LinesearchProblem for linesearch algorithm.

The variable on which this problem depends is $\alpha$.

Example

julia> x = [1, 0., 0.]
3-element Vector{Float64}:
 1.0
 0.0
 0.0

julia> f = x -> sum(x .^ 3 / 6 + x .^ 2 / 2);

julia> obj = OptimizerProblem(f, x);

julia> grad = GradientAutodiff{Float64}(obj.F, length(x));

julia> hess = HessianAutodiff{Float64}(obj.F, length(x));

julia> cache = NewtonOptimizerCache(x);

julia> state = NewtonOptimizerState(x); update!(state, grad, x);

julia> params = (x = state.x,);

julia> update!(cache, state, grad, hess, x);

julia> ls_obj = linesearch_problem(obj, grad, cache);

julia> ls_obj.F(0., params)
0.6666666666666666

julia> ls_obj.D(0., params)
-1.125

Info

Note that in the example above calling update! on the NewtonOptimizerCache requires a SimpleSolvers.Hessian.

source

SimpleSolvers.solve! — Method

solve!(x, state, opt)

Solve the optimization problem described by opt::EuclideanOptimizer and store the result in x.

Examples

julia> f(x) = sum(x .^ 2 + x .^ 3 / 3);

julia> x = [1f0, 2f0]
2-element Vector{Float32}:
 1.0
 2.0

julia> opt = EuclideanOptimizer(x, f; algorithm = Newton());

julia> state = NewtonOptimizerState(x);

julia> solve!(x, state, opt)
GeometricOptimizers.OptimizerResult{Float32, Float32, Vector{Float32}, GeometricOptimizers.OptimizerStatus{Float32, Float32}}( * Convergence measures

    |x - x'|               = 7.82e-03
    |x - x'|/|x'|          = 2.56e+02
    |f(x) - f(x')|         = 6.18e-05
    |f(x) - f(x')|/|f(x')| = 6.63e+04
    |g(x) - g(x')|         = 1.57e-02
    |g(x)|                 = 6.10e-05
, Float32[4.6478817f-8, 3.0517578f-5], 9.313341f-10)

julia> x
2-element Vector{Float32}:
 4.6478817f-8
 3.0517578f-5

julia> iteration_number(state)
4

Also see solver_step!.

source