GeometricOptimizers
Documentation for GeometricOptimizers.
Base.MatrixGeometricOptimizers.AbstractCacheGeometricOptimizers.AbstractLieAlgHorMatrixGeometricOptimizers.AbstractOptimizerProblemGeometricOptimizers.AbstractTriangularGeometricOptimizers.AdamGeometricOptimizers.AdamCacheGeometricOptimizers.AdamWithDecayGeometricOptimizers.BFGSGeometricOptimizers.BFGSCacheGeometricOptimizers.BFGSStateGeometricOptimizers.DFPCacheGeometricOptimizers.DFPStateGeometricOptimizers.EuclideanOptimizerGeometricOptimizers.EuclideanOptimizerMethodGeometricOptimizers.GlobalSectionGeometricOptimizers.GradientCacheGeometricOptimizers.GrassmannLieAlgHorMatrixGeometricOptimizers.GrassmannLieAlgHorMatrixGeometricOptimizers.GrassmannManifoldGeometricOptimizers.HessianBFGSGeometricOptimizers.HessianDFPGeometricOptimizers.IterativeHessianGeometricOptimizers.LowerTriangularGeometricOptimizers.LowerTriangularGeometricOptimizers.ManifoldGeometricOptimizers.MomentumGeometricOptimizers.MomentumCacheGeometricOptimizers.NewtonOptimizerCacheGeometricOptimizers.NewtonOptimizerStateGeometricOptimizers.OptimizerGeometricOptimizers.OptimizerGeometricOptimizers.OptimizerCacheGeometricOptimizers.OptimizerMethodGeometricOptimizers.OptimizerProblemGeometricOptimizers.OptimizerResultGeometricOptimizers.OptimizerStateGeometricOptimizers.OptimizerStatusGeometricOptimizers.QuasiNewtonOptimizerMethodGeometricOptimizers.SkewSymMatrixGeometricOptimizers.SkewSymMatrixGeometricOptimizers.StiefelLieAlgHorMatrixGeometricOptimizers.StiefelLieAlgHorMatrixGeometricOptimizers.StiefelManifoldGeometricOptimizers.StiefelProjectionGeometricOptimizers.StiefelProjectionGeometricOptimizers.StiefelProjectionGeometricOptimizers.SymmetricMatrixGeometricOptimizers.SymmetricMatrixGeometricOptimizers.UpperTriangularGeometricOptimizers.UpperTriangularGeometricOptimizers._BFGSGeometricOptimizers._BFGSCacheGeometricOptimizers._DFPGeometricOptimizers._GradientBase.:*Base.randBase.randBase.vecBase.vecBase.vecGeometricBase.update!GeometricBase.update!GeometricBase.update!GeometricBase.update!GeometricBase.update!GeometricBase.update!GeometricBase.update!GeometricBase.update!GeometricOptimizers.apply_sectionGeometricOptimizers.apply_section!GeometricOptimizers.cayleyGeometricOptimizers.cayleyGeometricOptimizers.cayleyGeometricOptimizers.convergence_measuresGeometricOptimizers.geodesicGeometricOptimizers.geodesicGeometricOptimizers.geodesicGeometricOptimizers.global_repGeometricOptimizers.global_repGeometricOptimizers.global_sectionGeometricOptimizers.global_sectionGeometricOptimizers.gradientGeometricOptimizers.gradientGeometricOptimizers.gradientGeometricOptimizers.init_optimizer_cacheGeometricOptimizers.isaOptimizerStateGeometricOptimizers.meets_stopping_criteriaGeometricOptimizers.metricGeometricOptimizers.metricGeometricOptimizers.optimization_step!GeometricOptimizers.rgradGeometricOptimizers.rgradGeometricOptimizers.rhsGeometricOptimizers.rhsGeometricOptimizers.rhsGeometricOptimizers.solver_step!GeometricOptimizers.valueGeometricOptimizers.ΩGeometricOptimizers.ΩGeometricOptimizers.𝔄GeometricOptimizers.𝔄SimpleSolvers.directionSimpleSolvers.directionSimpleSolvers.directionSimpleSolvers.linesearch_problemSimpleSolvers.solve!
Base.Matrix — Method
Matrix(λY::GlobalSection)Put λY into matrix form.
This is not recommended if speed is important!
Use apply_section and global_rep instead!
GeometricOptimizers.AbstractCache — Type
AbstractCacheAbstractCache has subtypes: AdamCache, MomentumCache, GradientCache and _BFGSCache.
All of them can be initialized with providing an array (also supporting manifold types).
GeometricOptimizers.AbstractLieAlgHorMatrix — Type
AbstractLieAlgHorMatrix <: AbstractMatrixAbstractLieAlgHorMatrix is a supertype for various horizontal components of Lie algebras. We usually call this $\mathfrak{g}^\mathrm{hor}$.
See StiefelLieAlgHorMatrix and GrassmannLieAlgHorMatrix for concrete examples.
GeometricOptimizers.AbstractOptimizerProblem — Type
AbstractOptimizerProblemGeometricOptimizers.AbstractTriangular — Type
AbstractTriangularSee UpperTriangular and LowerTriangular.
GeometricOptimizers.Adam — Type
Adam(η, ρ₁, ρ₂, δ)Make an instance of the Adam Optimizer.
Here the cache consists of first and second moments that are updated as
\[B_1 \gets ((\rho_1 - \rho_1^t)/(1 - \rho_1^t))\cdot{}B_1 + (1 - \rho_1)/(1 - \rho_1^t)\cdot{}\nabla{}L,\]
and
\[B_2 \gets ((\rho_2 - \rho_1^t)/(1 - \rho_2^t))\cdot{}B_2 + (1 - \rho_2)/(1 - \rho_2^t)\cdot\nabla{}L\odot\nabla{}L.\]
The final velocity is computed as:
\[\mathrm{velocity} \gets -\eta{}B_1/\sqrt{B_2 + \delta}.\]
Implementation
The velocity is stored in the input to save memory:
mul!(B, -o.method.η, /ᵉˡᵉ(C.B₁, scalar_add(racᵉˡᵉ(C.B₂), o.method.δ)))where B is the input to the [update!] function.
The algorithm and suggested defaults are taken from [1, page 301].
GeometricOptimizers.AdamCache — Type
AdamCache(Y)Store the first and second moment for Y (initialized as zeros).
First and second moments are called B₁ and B₂.
If the cache is called with an instance of a homogeneous space, e.g. the StiefelManifold $St(n,N)$ it initializes the moments as elements of $\mathfrak{g}^\mathrm{hor}$ (StiefelLieAlgHorMatrix).
Examples
using GeometricOptimizers
Y = rand(StiefelManifold, 5, 3)
GeometricOptimizers.AdamCache(Y).B₁
# output
5×5 StiefelLieAlgHorMatrix{Float64, SkewSymMatrix{Float64, Vector{Float64}}, Matrix{Float64}}:
0.0 -0.0 -0.0 -0.0 -0.0
0.0 0.0 -0.0 -0.0 -0.0
0.0 0.0 0.0 -0.0 -0.0
0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0GeometricOptimizers.AdamWithDecay — Type
AdamWithDecay(n_epochs, η₁=1f-2, η₂=1f-6, ρ₁=9f-1, ρ₂=9.9f-1, δ=1f-8)Make an instance of the Adam Optimizer with weight decay.
All except the first argument (the number of epochs) have defaults.
The difference to the standard Adam is that we change the learning reate $\eta$ in each step. Apart from the time dependency of $\eta$ the two algorithms are however equivalent. $\eta(0)$ starts with a high value $\eta_1$ and then exponentially decrease until it reaches $\eta_2$ with
\[ \eta(t) = \gamma^t\eta_1,\]
where $\gamma = \exp(\log(\eta_1 / \eta_2) / \mathtt{n\_epochs}).$
GeometricOptimizers.BFGS — Type
BFGS(η, δ)Make an instance of the Broyden-Fletcher-Goldfarb-Shanno (BFGS) optimizer.
η is the learning rate. δ is a stabilization parameter.
GeometricOptimizers.BFGSCache — Type
BFGSCacheThe OptimizerCache for the _BFGS algorithm. Also see update!(::BFGSCache, ::OptimizerState, ::AbstractVector, ::AbstractVector).
GeometricOptimizers.DFPCache — Type
DFPCache <: OptimizerCacheThe OptimizerCache corresponding to the _DFP method.
GeometricOptimizers.DFPState — Type
DFPStateThis is equivalent to BFGSState.
GeometricOptimizers.EuclideanOptimizer — Type
EuclideanOptimizerThe optimizer that stores all the information needed for an optimization problem.
This problem can be solved by calling solve!(::AbstractVector, ::Optimizer).
Keys
algorithm::OptimizerState,problem::OptimizerProblem,gradient::SimpleSolvers.Gradient,hessian::SimpleSolvers.Hessian,config::SimpleSolvers.Options,cache::OptimizerCache,linesearch::SimpleSolvers.Linesearch.
Examples
F(x) = sum(sin.(x) .^ 2)
x = ones(3)
algorithm = Newton()
state = OptimizerState(algorithm, x)
optimizer = EuclideanOptimizer(x, F; algorithm = algorithm, linesearch = Bisection())
solve!(x, state, optimizer)
x
# output
3-element Vector{Float64}:
1.1102230246251565e-16
1.1102230246251565e-16
1.1102230246251565e-16We note that this same problem may have trouble converging with other line searches:
x = ones(3)
algorithm = Newton()
state = OptimizerState(algorithm, x)
optimizer = EuclideanOptimizer(x, F; algorithm = algorithm, linesearch = Backtracking())
solve!(x, state, optimizer)
x
# output
3-element Vector{Float64}:
1.0
1.0
1.0GeometricOptimizers.EuclideanOptimizerMethod — Type
EuclideanOptimizerMethod <: SolverMethodThe EuclideanOptimizerMethod is used in EuclideanOptimizer and determines the algorithm that is used.
GeometricOptimizers.GlobalSection — Type
GlobalSection(Y)Construct a global section for Y.
A global section $\lambda$ is a mapping from a homogeneous space $\mathcal{M}$ to the corresponding Lie group $G$ such that
\[\lambda(Y)E = Y,\]
Also see apply_section and global_rep.
Implementation
For an implementation of GlobalSection for a custom array (especially manifolds), the function global_section has to be generalized.
GeometricOptimizers.GradientCache — Type
GradientCache(Y)Do not store anything.
The cache for the SimpleSolvers.Gradient does not consider past information.
GeometricOptimizers.GrassmannLieAlgHorMatrix — Type
GrassmannLieAlgHorMatrix(B::AbstractMatrix, N::Integer, n::Integer)Build an instance of GrassmannLieAlgHorMatrix based on an arbitrary matrix B of size $(N-n)\times{}n$.
GrassmannLieAlgHorMatrix is the horizontal component of the Lie algebra of skew-symmetric matrices (with respect to the canonical metric).
Extended help
The projection here is: $\pi:S \to SE/\sim$ where
\[E = \begin{bmatrix} \mathbb{I}_{n} \\ \mathbb{O}_{(N-n)\times{}n} \end{bmatrix},\]
and the equivalence relation is
\[V_1 \sim V_2 \iff \exists A\in\mathcal{S}_\mathrm{skew}(n) \text{ such that } V_2 = V_1 + \begin{bmatrix} A \\ \mathbb{O} \end{bmatrix}\]
An element of GrassmannLieAlgMatrix takes the form:
\[\begin{pmatrix} \bar{\mathbb{O}} & B^T \\ B & \mathbb{O} \end{pmatrix},\]
where $\bar{\mathbb{O}}\in\mathbb{R}^{n\times{}n}$ and $\mathbb{O}\in\mathbb{R}^{(N - n)\times(N-n)}.$
GeometricOptimizers.GrassmannLieAlgHorMatrix — Method
GrassmannLieAlgHorMatrix(D::AbstractMatrix, n::Integer)Take a big matrix as input and build an instance of GrassmannLieAlgHorMatrix.
The integer $N$ in $Gr(n, N)$ here is the number of rows of D.
Extended help
If the constructor is called with a big $N\times{}N$ matrix, then the projection is performed the following way:
\[\begin{pmatrix} A & B_1 \\ B_2 & D \end{pmatrix} \mapsto \begin{pmatrix} \bar{\mathbb{O}} & -B_2^T \\ B_2 & \mathbb{O} \end{pmatrix}.\]
This can also be seen as the operation:
\[D \mapsto \Omega(E, DE - EE^TDE),\]
where $\Omega$ is the horizontal lift GeometricOptimizers.Ω.
GeometricOptimizers.GrassmannManifold — Type
GrassmannManifold <: ManifoldThe GrassmannManifold is based on the StiefelManifold.
GeometricOptimizers.HessianBFGS — Type
HessianBFGS <: HessianA struct derived from SimpleSolvers.Hessian to be used for an EuclideanOptimizer.
GeometricOptimizers.HessianDFP — Type
HessianDFP <: HessianThe SimpleSolvers.Hessian corresponding to the _DFP method.
GeometricOptimizers.IterativeHessian — Type
IterativeHessian <: HessianAn abstract type derived from SimpleSolvers.Hessian. Its main purpose is defining a supertype that encompasses HessianBFGS and HessianDFP for dispatch.
GeometricOptimizers.LowerTriangular — Type
LowerTriangular(S::AbstractVector, n::Int)Build a lower-triangular matrix from a vector.
A lower-triangular matrix is an $n\times{}n$ matrix that has zeros on the diagonal and on the upper triangular.
The data are stored in a vector $S$ similarly to other matrices. See UpperTriangular, SkewSymMatrix and SymmetricMatrix.
The struct two fields: S and n. The first stores all the entries of the matrix in a sparse fashion (in a vector) and the second is the dimension $n$ for $A\in\mathbb{R}^{n\times{}n}$.
Examples
using GeometricOptimizers
S = [1, 2, 3, 4, 5, 6]
LowerTriangular(S, 4)
# output
4×4 LowerTriangular{Int64, Vector{Int64}}:
0 0 0 0
1 0 0 0
2 3 0 0
4 5 6 0GeometricOptimizers.LowerTriangular — Method
LowerTriangular(A::AbstractMatrix)Build a lower-triangular matrix from a matrix.
This is done by taking the lower left of that matrix.
Examples
using GeometricOptimizers
M = [1 2 3 4; 5 6 7 8; 9 10 11 12; 13 14 15 16]
LowerTriangular(M)
# output
4×4 LowerTriangular{Int64, Vector{Int64}}:
0 0 0 0
5 0 0 0
9 10 0 0
13 14 15 0GeometricOptimizers.Manifold — Type
Manifold <: AbstractMatrixA manifold in GeometricOptimizers is a sutype of AbstractMatrix. All manifolds are matrix manifolds and therefore stored as matrices. More details can be found in the docstrings for the StiefelManifold and the GrassmannManifold.
GeometricOptimizers.Momentum — Type
Momentum(η, α)Make an instance of the momentum optimizer.
The momentum optimizer is similar to the SimpleSolvers.Gradient. It however has a nontrivial cache that stores past history (see MomentumCache). The cache is updated via:
\[ B^{\mathrm{cache}} \gets \alpha{}B^{\mathrm{cache}} + \nabla_\mathrm{weights}L\]
and then the final velocity is computed as
\[ \mathrm{velocity} \gets - \eta{}B^{\mathrm{cache}}.\]
Implementation
To save memory the velocity is stored in the input $\nabla_WL$. This is similar to the case of the SimpleSolvers.Gradient.
GeometricOptimizers.MomentumCache — Type
MomentumCache(Y)Store the moment for Y (initialized as zeros).
The moment is called B.
If the cache is called with an instance of a Manifold it initializes the moments as elements of $\mathfrak{g}^\mathrm{hor}$ (AbstractLieAlgHorMatrix).
See AdamCache.
GeometricOptimizers.NewtonOptimizerCache — Type
NewtonOptimizerCache <: OptimizerCacheKeys
x: current iterate (this stores the guess called by the functions generated withlinesearch_problem),Δx: direction of optimization step (difference betweenxandx̄); this is obtained by multiplyingrhswith the inverse of the Hessian,g: gradient value (this stores the gradient associated withxcalled by the derivative part oflinesearch_problem),Δg: gradient difference (difference betweengandḡ); this is used for computing theOptimizerStatus,rhs: the right hand side used to compute the update,H: the Hessian matrix evaluated atx,
Also compare this to SimpleSolvers.NonlinearSolverCache.
GeometricOptimizers.NewtonOptimizerState — Type
NewtonOptimizerState <: OptimizerStateThe optimizer state is needed to update the EuclideanOptimizer. This is different from OptimizerStatus and OptimizerResult which serve as diagnostic tools.
We note that this is also used for the _BFGS and the _DFP optimizer.
Keys
xx̄gḡf̄f̄
GeometricOptimizers.Optimizer — Type
Optimizer(method, cache, step, retraction)Store the method (e.g. Adam with corresponding hyperparameters), the cache (e.g. AdamCache), the optimization step and the retraction.
It takes as input an optimization method and the parameters of a network.
Before one can call Optimizer a OptimizerMethod that stores all the hyperparameters of the optimizer needs to be specified.
GeometricOptimizers.Optimizer — Method
Optimizer(method, nn_params)Allocate the cache for a specific method and nn_params for an instance of Optimizer.
Internally this calls init_optimizer_cache.
An equivalent constructor is
Optimizer(method, nn::NeuralNetwork)Arguments
The optional keyword argument is the retraction. By default this is cayley.
GeometricOptimizers.OptimizerCache — Type
OptimizerCacheSee e.g. NewtonOptimizerCache and BFGSCache.
GeometricOptimizers.OptimizerMethod — Type
OptimizerMethodEach Optimizer has to be called with an OptimizerMethod. This specifies how the neural network weights are updated in each optimization step.
GeometricOptimizers.OptimizerProblem — Type
OptimizerProblem <: AbstractOptimizerProblemUsed in EuclideanOptimizer. Also compare this to SimpleSolvers.NonlinearProblem.
Examples
julia> x = ones(3); F(x) = sum(sin.(x) .^ 2)
F (generic function with 1 method)
julia> OptimizerProblem(F, x)
OptimizerProblem{Float64, typeof(F), Missing, Missing}(F, missing, missing)If OptimizerProblem is called on a single function, the fields for SimpleSolvers.Gradient and SimpleSolvers.Hessian are missing.
GeometricOptimizers.OptimizerResult — Type
OptimizerResultServes as a diagnostic tool for the EuclideanOptimizer and is the return argument of solve!.
Keys
status::OptimizerStatus: current status of the optimization,x: solution,f: function value at solution.
GeometricOptimizers.OptimizerState — Type
An OptimizerState is a data structure that is used to dispatch on different algorithms.
It needs to implement three methods,
initialize!(alg::OptimizerState, ::AbstractVector)
update!(alg::OptimizerState, ::AbstractVector)
solver_step!(::AbstractVector, alg::OptimizerState)that initialize and update the state of the algorithm and perform an actual optimization step.
Further the following convenience methods should be implemented,
problem(alg::OptimizerState)
gradient(alg::OptimizerState)
hessian(alg::OptimizerState)
linesearch(alg::OptimizerState)which return the problem to optimize, its gradient and (approximate) Hessian as well as the linesearch algorithm used in conjunction with the optimization algorithm if any.
See NewtonOptimizerState for a struct that was derived from OptimizerState.
GeometricOptimizers.OptimizerStatus — Type
OptimizerStatusContains residuals (relative and absolute) and various convergence properties.
This is also used in OptimizerResult.
Examples
x = ones(3)
state = NewtonOptimizerState(x)
cache = NewtonOptimizerCache(x)
f = 1.
config = Options()
OptimizerStatus(state, cache, f; config = config)
# output
* Convergence measures
|x - x'| = NaN
|x - x'|/|x'| = NaN
|f(x) - f(x')| = NaN
|f(x) - f(x')|/|f(x')| = NaN
|g(x) - g(x')| = NaN
|g(x)| = NaN
GeometricOptimizers.SkewSymMatrix — Type
SkewSymMatrix(S::AbstractVector, n::Integer)Instantiate a skew-symmetric matrix with information stored in vector S.
A skew-symmetric matrix $A$ is a matrix $A^T = -A$.
Internally the struct saves a vector $S$ of size $n(n-1)\div2$. The conversion is done the following way:
\[[A]_{ij} = \begin{cases} 0 & \text{if $i=j$} \\ S[( (i-2) (i-1) ) \div 2 + j] & \text{if $i>j$}\\ S[( (j-2) (j-1) ) \div 2 + i] & \text{else}. \end{cases}\]
So $S$ stores a string of vectors taken from $A$: $S = [\tilde{a}_1, \tilde{a}_2, \ldots, \tilde{a}_n]$ with $\tilde{a}_i = [[A]_{i1},[A]_{i2},\ldots,[A]_{i(i-1)}]$.
Also see SymmetricMatrix, LowerTriangular and UpperTriangular.
Examples
using GeometricOptimizers
S = [1, 2, 3, 4, 5, 6]
SkewSymMatrix(S, 4)
# output
4×4 SkewSymMatrix{Int64, Vector{Int64}}:
0 -1 -2 -4
1 0 -3 -5
2 3 0 -6
4 5 6 0GeometricOptimizers.SkewSymMatrix — Method
SkewSymMatrix(A::AbstractMatrix)Perform 0.5 * (A - A') and store the matrix in an efficient way (as a vector with $n(n-1)/2$ entries).
If the constructor is called with a matrix as input it returns a skew-symmetric matrix via the projection:
\[A \mapsto \frac{1}{2}(A - A^T).\]
Examples
using GeometricOptimizers
M = [1 2 3 4; 5 6 7 8; 9 10 11 12; 13 14 15 16]
SkewSymMatrix(M)
# output
4×4 SkewSymMatrix{Float64, Vector{Float64}}:
0.0 -1.5 -3.0 -4.5
1.5 0.0 -1.5 -3.0
3.0 1.5 0.0 -1.5
4.5 3.0 1.5 0.0Extended help
Note that the constructor is designed in such a way that it always returns matrices of type SkewSymMatrix{<:AbstractFloat} when called with a matrix, even if this matrix is of type AbstractMatrix{<:Integer}.
If the user wishes to allocate a matrix SkewSymMatrix{<:Integer} then call:
SkewSymMatrix(::AbstractVector, n::Integer)Note that this is different from LowerTriangular and UpperTriangular as no porjection takes place there.
GeometricOptimizers.StiefelLieAlgHorMatrix — Type
StiefelLieAlgHorMatrix(A::SkewSymMatrix, B::AbstractMatrix, N::Integer, n::Integer)Build an instance of StiefelLieAlgHorMatrix based on a skew-symmetric matrix A and an arbitrary matrix B.
An element of StiefelLieAlgMatrix takes the form:
\[\begin{pmatrix} A & B^T \\ B & \mathbb{O} \end{pmatrix},\]
where $A$ is skew-symmetric (this is SkewSymMatrix in GeometricOptimizers).
Also see GrassmannLieAlgHorMatrix.
Extended help
StiefelLieAlgHorMatrix is the horizontal component of the Lie algebra of skew-symmetric matrices (with respect to the canonical metric).
The projection here is: $\pi:S \to SE$ where
\[E = \begin{bmatrix} \mathbb{I}_{n} \\ \mathbb{O}_{(N-n)\times{}n} \end{bmatrix}.\]
The matrix $E$ is implemented under StiefelProjection in GeometricOptimizers.
GeometricOptimizers.StiefelLieAlgHorMatrix — Method
StiefelLieAlgHorMatrix(D::AbstractMatrix, n::Integer)Take a big matrix as input and build an instance of StiefelLieAlgHorMatrix.
The integer $N$ in $St(n, N)$ is the number of rows of D.
Extended help
If the constructor is called with a big $N\times{}N$ matrix, then the projection is performed the following way:
\[\begin{pmatrix} A & B_1 \\ B_2 & D \end{pmatrix} \mapsto \begin{pmatrix} \mathrm{skew}(A) & -B_2^T \\ B_2 & \mathbb{O} \end{pmatrix}.\]
The operation $\mathrm{skew}:\mathbb{R}^{n\times{}n}\to\mathcal{S}_\mathrm{skew}(n)$ is the skew-symmetrization operation. This is equivalent to calling of SkewSymMatrix with an $n\times{}n$ matrix.
This can also be seen as the operation:
\[D \mapsto \Omega(E, DE) = \mathrm{skew}\left(2 \left(\mathbb{I} - \frac{1}{2} E E^T \right) DE E^T\right).\]
Also see GeometricOptimizers.Ω.
GeometricOptimizers.StiefelManifold — Type
StiefelManifold <: ManifoldAn implementation of the Stiefel manifold [2]. The Stiefel manifold is the collection of all matrices $Y\in\mathbb{R}^{N\times{}n}$ whose columns are orthonormal, i.e.
\[ St(n, N) = \{Y: Y^TY = \mathbb{I}_n \}.\]
The Stiefel manifold can be shown to have manifold structure (as the name suggests) and this is heavily used in GeometricOptimizers. It is further a compact space. More information can be found in the docstrings for rgrad(::StiefelManifold, ::AbstractMatrix) and metric(::StiefelManifold, ::AbstractMatrix, ::AbstractMatrix).
GeometricOptimizers.StiefelProjection — Type
StiefelProjection(backend, T, N, n)Make a matrix of the form $\begin{bmatrix} \mathbb{I} & \mathbb{O} \end{bmatrix}^T$ for a specific backend and data type.
An array that essentially does vcat(I(n), zeros(N-n, n)) with GPU support.
Extended help
An instance of StiefelProjection should technically also belong to StiefelManifold.
GeometricOptimizers.StiefelProjection — Method
StiefelProjection(A::AbstractMatrix)Extract necessary information from A and build an instance of StiefelProjection.
Necessary information here referes to the backend, the data type and the size of the matrix.
GeometricOptimizers.StiefelProjection — Method
StiefelProjection(B::AbstractLieAlgHorMatrix)Extract necessary information from B and build an instance of StiefelProjection.
Necessary information here referes to the backend, the data type and the size of the matrix.
The size is queried through B.N and B.n.
Examples
using GeometricOptimizers
using GeometricOptimizers: StiefelProjection
B₁ = rand(StiefelLieAlgHorMatrix, 5, 2)
B₂ = rand(GrassmannLieAlgHorMatrix, 5, 2)
E = [1. 0.; 0. 1.; 0. 0.; 0. 0.; 0. 0.]
StiefelProjection(B₁) ≈ StiefelProjection(B₂) ≈ E
# output
trueGeometricOptimizers.SymmetricMatrix — Type
SymmetricMatrix(S::AbstractVector, n::Integer)Instantiate a symmetric matrix with information stored in vector S.
A SymmetricMatrix $A$ is a matrix $A^T = A$.
Internally the struct saves a vector $S$ of size $n(n+1)\div2$. The conversion is done the following way:
\[[A]_{ij} = \begin{cases} S[( (i-1) i ) \div 2 + j] & \text{if $i\geq{}j$}\\ S[( (j-1) j ) \div 2 + i] & \text{else}. \end{cases}\]
So $S$ stores a string of vectors taken from $A$: $S = [\tilde{a}_1, \tilde{a}_2, \ldots, \tilde{a}_n]$ with $\tilde{a}_i = [[A]_{i1},[A]_{i2},\ldots,[A]_{ii}]$.
Also see SkewSymMatrix, LowerTriangular and UpperTriangular.
Examples
using GeometricOptimizers
S = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
SymmetricMatrix(S, 4)
# output
4×4 SymmetricMatrix{Int64, Vector{Int64}}:
1 2 4 7
2 3 5 8
4 5 6 9
7 8 9 10GeometricOptimizers.SymmetricMatrix — Method
SymmetricMatrix(A::AbstractMatrix)Perform a projection and store the matrix in an efficient way (as a vector with $n(n+1)/2$ entries).
If the constructor is called with a matrix as input it returns a symmetric matrix via the projection:
\[A \mapsto \frac{1}{2}(A + A^T).\]
Examples
using GeometricOptimizers
M = [1 2 3 4; 5 6 7 8; 9 10 11 12; 13 14 15 16]
SymmetricMatrix(M)
# output
4×4 SymmetricMatrix{Float64, Vector{Float64}}:
1.0 3.5 6.0 8.5
3.5 6.0 8.5 11.0
6.0 8.5 11.0 13.5
8.5 11.0 13.5 16.0Extended help
Note that the constructor is designed in such a way that it always returns matrices of type SymmetricMatrix{<:AbstractFloat} when called with a matrix, even if this matrix is of type AbstractMatrix{<:Integer}.
If the user wishes to allocate a matrix SymmetricMatrix{<:Integer} then call
$julia SymmetricMatrix(::AbstractVector, n::Integer)$`
Note that this is different from LowerTriangular and UpperTriangular as no porjection takes place there.
GeometricOptimizers.UpperTriangular — Type
UpperTriangular(S::AbstractVector, n::Int)Build an upper-triangular matrix from a vector.
An upper-triangular matrix is an $n\times{}n$ matrix that has zeros on the diagonal and on the lower triangular.
The data are stored in a vector $S$ similarly to other matrices. See LowerTriangular, SkewSymMatrix and SymmetricMatrix.
The struct two fields: S and n. The first stores all the entries of the matrix in a sparse fashion (in a vector) and the second is the dimension $n$ for $A\in\mathbb{R}^{n\times{}n}$.
Examples
using GeometricOptimizers
S = [1, 2, 3, 4, 5, 6]
UpperTriangular(S, 4)
# output
4×4 UpperTriangular{Int64, Vector{Int64}}:
0 1 2 4
0 0 3 5
0 0 0 6
0 0 0 0GeometricOptimizers.UpperTriangular — Method
UpperTriangular(A::AbstractMatrix)Build an upper-triangular matrix from a matrix.
This is done by taking the upper right of that matrix.
Examples
using GeometricOptimizers
M = [1 2 3 4; 5 6 7 8; 9 10 11 12; 13 14 15 16]
UpperTriangular(M)
# output
4×4 UpperTriangular{Int64, Vector{Int64}}:
0 2 3 4
0 0 7 8
0 0 0 12
0 0 0 0GeometricOptimizers._BFGS — Type
Algorithm taken from [3].
GeometricOptimizers._BFGSCache — Type
BFGSCache(B)Make the cache for the BFGS optimizer based on the array B.
It stores an array for the gradient of the previous time step B and the inverse of the Hessian matrix H.
The cache for the inverse of the Hessian is initialized with the idendity. The cache for the previous gradient information is initialized with the zero vector.
Note that the cache for H is changed iteratively, whereas the cache for B is newly assigned at every time step.
GeometricOptimizers._DFP — Type
Algorithm taken from [3].
GeometricOptimizers._Gradient — Type
Gradient(η)Make an instance of a gradient optimizer.
This is the simplest neural network optimizer. It has no cache and computes the final velocity as:
\[ \mathrm{velocity} \gets - \eta\nabla_\mathrm{weight}L.\]
Implementation
The operations are done as memory efficiently as possible. This means the provided $\nabla_WL$ is mutated via:
rmul!(∇L, -method.η)Base.rand — Method
rand(manifold_type, N, n)Draw random elements from the Stiefel and the Grassmann manifold.
Because both of these manifolds are compact spaces we can sample them uniformly [4].
Examples
When we call ...
using GeometricOptimizers
using GeometricOptimizers: _round # hide
import Random
Random.seed!(123)
N, n = 5, 3
Y = rand(StiefelManifold{Float32}, N, n)
_round(Y; digits = 5) # hide
# output
5×3 StiefelManifold{Float32, Matrix{Float32}}:
-0.27575 0.32991 0.77275
-0.62485 -0.33224 -0.0686
-0.69333 0.36724 -0.18988
-0.09295 -0.73145 0.46064
0.2102 0.33301 0.38717... the sampling is done by first allocating a random matrix of size $N\times{}n$ via Y = randn(Float32, N, n).
We then perform a QR decomposition Q, R = qr(Y) with the qr function from the LinearAlgebra package (this is using Householder reflections internally).
The final output are then the first n columns of the Q matrix.
Base.vec — Method
vec(A::StiefelLieAlgHorMatrix)Vectorize A.
Examples
using GeometricOptimizers
A = SkewSymMatrix([1, ], 2)
B = [2 3; ]
B̄ = StiefelLieAlgHorMatrix(A, B, 3, 2)
B̄ |> vec
# output
vcat(1-element Vector{Int64}, 2-element Vector{Int64}):
1
2
3Implementation
This is using Vcat from the package LazyArrays.
GeometricBase.update! — Method
update!(opt, x)Update the cache contained in the optimizer opt.See e.g. update!(::NewtonOptimizerCache, ::OptimizerState, ::Gradient, ::Hessian, ::AbstractVector).
GeometricBase.update! — Method
update!(cache, state, x, g)Update the NewtonOptimizerCache based on x and g.
GeometricBase.update! — Method
update!(cache::NewtonOptimizerCache, x, g, hes)Update an instance of NewtonOptimizerCache based on x.
This is used in update!(::OptimizerState, ::AbstractVector).
This sets:
\[\begin{aligned} % \bar{x}^\mathtt{cache} & \gets x, \\ x^\mathtt{cache} & \gets x, \\ g^\mathtt{cache} & \gets g, \\ \mathrm{rhs}^\mathtt{cache} & \gets -g, \\ H^\mathtt{cache} & \gets H(x), \\ \delta^\mathtt{cache} & \gets (H^\mathtt{cache})^{-1}\mathrm{rhs}^\mathtt{cache}, \end{aligned}\]
where we wrote $H$ for the Hessian (i.e. the input argument hes).
GeometricBase.update! — Method
update!(state::NewtonOptimizerState, gradient, x)Update an instance of NewtonOptimizerState based on x and gradient, where g is of type SimpleSolvers.Gradient.
Examples
If we only call update! once there are still NaNs for x̄, ḡ and f̄.
f(x) = sum(x.^2)
x = [1., 2.]
state = NewtonOptimizerState(x)
grad = GradientAutodiff{Float64}(f, length(x))
update!(state, grad, x)
# output
NewtonOptimizerState{Float64, Vector{Float64}, Vector{Float64}}(0, [1.0, 2.0], [NaN, NaN], [2.0, 4.0], [NaN, NaN], 5.0, NaN)GeometricBase.update! — Method
update!(o, cache, B)Update the cache and output a final velocity that is stored in B.
Note that $B\in\mathfrak{g}^\mathrm{hor}$ in general.
In the manifold case the final velocity is the input to a retraction.
GeometricBase.update! — Method
update!(o::Optimizer{<:BFGS}, C, B)Peform an update with the BFGS optimizer.
C is the cache, B contains the gradient information (the output of global_rep in general).
First we compute the final velocity with
vecS = -o.method.η * C.H * vec(B)and then we update H
C.H .= (𝕀 - ρ * SY) * C.H * (𝕀 - ρ * SY') + ρ * vecS * vecS'where SY is vecS * Y' and 𝕀 is the idendity.
Implementation
For stability we use δ for computing ρ:
ρ = 1. / (vecS' * Y + o.method.δ)This is similar to the Adam
Extended help
If we have weights on a Manifold than the updates are slightly more difficult. In this case the vec operation has to be generalized to the corresponding global tangent space.
GeometricBase.update! — Method
update!(cache, x, g)Update the BFGSCache based on x and g.
Extended help
The update rule used here can be found in [5] and [3]:
It does:
\[\begin{aligned} \delta & \gets x^{(k)} - x^{(k-1)}, \\ \gamma & \gets \nabla{}f^{(k)} - \nabla{}f^{(k-1)}, \\ T_1 & \gets \delta\gamma^TQ, \\ T_2 & \gets Q\gamma\delta^T, \\ T_3 & \gets (1 + \frac{\gamma^TQ\gamma}{\delta^\gamma})\delta\delta^T,\\ Q & \gets Q - (T_1 + T_2 - T_3)/{\delta^T\gamma} \end{aligned}\]
GeometricBase.update! — Method
GeometricOptimizers.apply_section! — Method
apply_section!(Y::AT, λY::GlobalSection{T, AT}, Y₂::AT) where {T, AT<:StiefelManifold{T}}Apply λY to Y₂ and store the result in Y.
This is the inplace version of apply_section.
GeometricOptimizers.apply_section — Method
apply_section(λY::GlobalSection{T, AT}, Y₂::AT) where {T, AT <: StiefelManifold{T}}Apply λY to Y₂.
Mathematically this is the group action of the element $\lambda{}Y\in{}G$ on the element $Y_2$ of the homogeneous space $\mathcal{M}$.
Internally it calls apply_section!.
GeometricOptimizers.cayley — Method
cayley(B̄::GrassmannLieAlgHorMatrix)Compute the Cayley retraction of B.
This is equivalent to the method of cayley for StiefelLieAlgHorMatrix.
GeometricOptimizers.cayley — Method
cayley(B̄::StiefelLieAlgHorMatrix)Compute the Cayley retraction of B.
Implementation
Internally this is using
\[\mathrm{Cayley}(\bar{B}) = \mathbb{I} + \frac{1}{2} B' (\mathbb{I}_{2n} - \frac{1}{2} (B'')^T B')^{-1} (B'')^T (\mathbb{I} + \frac{1}{2} B),\]
with
\[\bar{B} = \begin{bmatrix} A & -B^T \\ B & \mathbb{O} \end{bmatrix} = \begin{bmatrix} \frac{1}{2}A & \mathbb{I} \\ B & \mathbb{O} \end{bmatrix} \begin{bmatrix} \mathbb{I} & \mathbb{O} \\ \frac{1}{2}A & -B^T \end{bmatrix} =: B'(B'')^T,\]
i.e. $\bar{B}$ is expressed as a product of two $N\times{}2n$ matrices.
GeometricOptimizers.cayley — Method
cayley(Y::Manifold, Δ)Take as input an element of a manifold Y and a tangent vector in Δ in the corresponding tangent space and compute the Cayley retraction.
In different notation: take as input an element $x$ of $\mathcal{M}$ and an element of $T_x\mathcal{M}$ and return $\mathrm{Cayley}(v_x).$
Examples
using GeometricOptimizers
Y = StiefelManifold([1. 0. 0.;]' |> Matrix)
Δ = [0. .5 0.;]' |> Matrix
Y₂ = GeometricOptimizers.cayley(Y, Δ)
Y₂' * Y₂ ≈ [1.;]
# output
trueSee the example in [geodesic(::Manifold{T}, ::AbstractMatrix{T}) where T].
GeometricOptimizers.convergence_measures — Method
convergence_measures(status, config)Checks if the optimizer converged.
Here status is an OptimizerStatus object and config is an SimpleSolvers.Options object.
GeometricOptimizers.geodesic — Method
geodesic(B̄::GrassmannLieAlgHorMatrix)Compute the geodesic of an element in GrassmannLieAlgHorMatrix.
This is equivalent to the method of geodesic for StiefelLieAlgHorMatrix.
GeometricOptimizers.geodesic — Method
geodesic(B̄::StiefelLieAlgHorMatrix)Compute the geodesic of an element in StiefelLieAlgHorMatrix.
Implementation
Internally this is using:
\[\mathbb{I} + B'\mathfrak{A}(B', B'')B'',\]
with
\[\bar{B} = \begin{bmatrix} A & -B^T \\ B & \mathbb{O} \end{bmatrix} = \begin{bmatrix} \frac{1}{2}A & \mathbb{I} \\ B & \mathbb{O} \end{bmatrix} \begin{bmatrix} \mathbb{I} & \mathbb{O} \\ \frac{1}{2}A & -B^T \end{bmatrix} =: B'(B'')^T.\]
This is using a computationally efficient version of the matrix exponential $\mathfrak{A}$.
GeometricOptimizers.geodesic — Method
geodesic(Y::Manifold, Δ)Take as input an element of a manifold Y and a tangent vector in Δ in the corresponding tangent space and compute the geodesic (exponential map).
In different notation: take as input an element $x$ of $\mathcal{M}$ and an element of $T_x\mathcal{M}$ and return $\mathtt{geodesic}(x, v_x) = \exp(v_x).$
Examples
using GeometricOptimizers
Y = StiefelManifold([1. 0. 0.;]' |> Matrix)
Δ = [0. .5 0.;]' |> Matrix
Y₂ = GeometricOptimizers.geodesic(Y, Δ)
Y₂' * Y₂ ≈ [1.;]
# output
trueImplementation
Internally this geodesic method calls geodesic(::StiefelLieAlgHorMatrix).
GeometricOptimizers.global_rep — Method
global_rep(λY::GlobalSection{T, AT}, Δ::AbstractMatrix{T}) where {T, AT<:GrassmannManifold{T}}Express Δ (an element of the tangent space of Y) as an instance of GrassmannLieAlgHorMatrix.
The method global_rep for GrassmannManifold is similar to that for StiefelManifold.
Examples
using GeometricOptimizers
using GeometricOptimizers: _round
import Random
Random.seed!(123)
Y = rand(GrassmannManifold, 6, 3)
Δ = rgrad(Y, randn(6, 3))
λY = GlobalSection(Y)
_round(global_rep(λY, Δ); digits = 3)
# output
6×6 GrassmannLieAlgHorMatrix{Float64, Matrix{Float64}}:
0.0 0.0 0.0 0.981 -2.058 0.4
0.0 0.0 0.0 -0.424 0.733 -0.919
0.0 0.0 0.0 -1.815 1.409 1.085
-0.981 0.424 1.815 0.0 0.0 0.0
2.058 -0.733 -1.409 0.0 0.0 0.0
-0.4 0.919 -1.085 0.0 0.0 0.0GeometricOptimizers.global_rep — Method
global_rep(λY::GlobalSection{T, AT}, Δ::AbstractMatrix{T}) where {T, AT<:StiefelManifold{T}}Express Δ (an the tangent space of Y) as an instance of StiefelLieAlgHorMatrix.
This maps an element from $T_Y\mathcal{M}$ to an element of $\mathfrak{g}^\mathrm{hor}$.
These two spaces are isomorphic where the isomorphism where the isomorphism is established through $\lambda(Y)\in{}G$ via:
\[T_Y\mathcal{M} \to \mathfrak{g}^{\mathrm{hor}}, \Delta \mapsto \lambda(Y)^{-1}\Omega(Y, \Delta)\lambda(Y).\]
Also see GeometricOptimizers.Ω.
Examples
using GeometricOptimizers
using GeometricOptimizers: _round
import Random
Random.seed!(123)
Y = rand(StiefelManifold, 6, 3)
Δ = rgrad(Y, randn(6, 3))
λY = GlobalSection(Y)
_round(global_rep(λY, Δ); digits = 3)
# output
6×6 StiefelLieAlgHorMatrix{Float64, SkewSymMatrix{Float64, Vector{Float64}}, Matrix{Float64}}:
0.0 0.679 1.925 0.981 -2.058 0.4
-0.679 0.0 0.298 -0.424 0.733 -0.919
-1.925 -0.298 0.0 -1.815 1.409 1.085
-0.981 0.424 1.815 0.0 0.0 0.0
2.058 -0.733 -1.409 0.0 0.0 0.0
-0.4 0.919 -1.085 0.0 0.0 0.0Implementation
The function global_rep does in fact not perform the entire map $\lambda(Y)^{-1}\Omega(Y, \Delta)\lambda(Y)$ but only
\[\Delta \mapsto \mathrm{skew}(Y^T\Delta),\]
to get the small skew-symmetric matrix $A\in\mathcal{S}_\mathrm{skew}(n)$ and
\[\Delta \mapsto (\lambda(Y)_{[1:N, n:N]}^T \Delta)_{[1:(N-n), 1:n]},\]
to get the arbitrary matrix $B\in\mathbb{R}^{(N-n)\times{}n}$.
GeometricOptimizers.global_section — Method
global_section(Y::GrassmannManifold)Compute a matrix of size $N\times(N-n)$ whose columns are orthogonal to the columns in Y.
The method global_section for the Grassmann manifold is equivalent to that for the StiefelManifold (we represent the Grassmann manifold as an embedding in the Stiefel manifold).
See the documentation for global_section(Y::StiefelManifold{T}) where T.
GeometricOptimizers.global_section — Method
global_section(Y::StiefelManifold)Compute a matrix of size $N\times(N-n)$ whose columns are orthogonal to the columns in Y.
This matrix is also called $Y_\perp$ [6–8].
Examples
using GeometricOptimizers
using GeometricOptimizers: global_section
import Random
Random.seed!(123)
Y = StiefelManifold([1. 0.; 0. 1.; 0. 0.; 0. 0.])
round.(Matrix(global_section(Y)); digits = 3)
# output
4×2 Matrix{Float64}:
0.0 -0.0
0.0 0.0
0.936 -0.353
0.353 0.936Further note that we convert the QRCompactWYQ object to a Matrix before we display it.
Implementation
The implementation is done with a QR decomposition (LinearAlgebra.qr!). Internally we do:
A = randn(N, N - n) # or the gpu equivalent
A = A - Y.A * (Y.A' * A)
qr!(A).QGeometricOptimizers.gradient — Method
gradient(cache)Return the stored gradient (array) of an instance of BFGSCache
GeometricOptimizers.gradient — Method
gradient(cache)Return the stored gradient (array) of an instance of DFPCache
GeometricOptimizers.gradient — Method
gradient(::NewtonOptimizerCache)Return the stored gradient (array) of an instance of NewtonOptimizerCache
GeometricOptimizers.init_optimizer_cache — Method
init_optimizer_cache(method, x)Initialize the optimizer cache based on input x for the given method.
GeometricOptimizers.isaOptimizerState — Method
isaOptimizerState(alg)Verify if an object implements the OptimizerState interface.
GeometricOptimizers.meets_stopping_criteria — Method
meets_stopping_criteria(status, config, iterations)Check if the optimizer has converged.
Implementation
meets_stopping_criteria checks if one of the following is true:
converged(the output ofSimpleSolvers.assess_convergence) istrueanditerations$\geq$config.min_iterations,- if
config.allow_f_increasesisfalse:status.f_increasedistrue, iterations$\geq$config.max_iterations,status.rxₐ$>$config.x_abstol_breakstatus.rxᵣ$>$config.x_reltol_breakstatus.rfₐ$>$config.f_abstol_breakstatus.rfᵣ$>$config.f_reltol_breakstatus.rg$>$config.g_restol_breakstatus.x_isnanstatus.f_isnanstatus.g_isnan
GeometricOptimizers.metric — Method
metric(Y::GrassmannManifold, Δ₁::AbstractMatrix, Δ₂::AbstractMatrix)Compute the metric for vectors Δ₁ and Δ₂ at Y.
The representation of the Grassmann manifold is realized as a quotient space of the Stiefel manifold.
The metric for the Grassmann manifold is:
\[g^{Gr}_Y(\Delta_1, \Delta_2) = g^{St}_Y(\Delta_1, \Delta_2) = \mathrm{Tr}(\Delta_1^T (\mathbb{I} - Y Y^T) \Delta_2) = \mathrm{Tr}(\Delta_1^T \Delta_2),\]
where we used that $Y^T\Delta_i$ for $i = 1, 2.$
GeometricOptimizers.metric — Method
metric(Y::StiefelManifold, Δ₁::AbstractMatrix, Δ₂::AbstractMatrix)Compute the dot product for Δ₁ and Δ₂ at Y.
This uses the canonical Riemannian metric for the Stiefel manifold:
\[g_Y: (\Delta_1, \Delta_2) \mapsto \mathrm{Tr}(\Delta_1^T(\mathbb{I} - \frac{1}{2}YY^T)\Delta_2).\]
GeometricOptimizers.optimization_step! — Method
optimization_step!(o, λY, ps, dx)Update the weights ps based on an EuclideanOptimizer, a cache and first-order derivatives dx.
optimization_step! is calling update! internally. update! has to be implemented for every OptimizerMethod.
Arguments
All arguments into optimization_step! are mandatory:
o::Optimizer,λY::NamedTuple: this named tuple has the same keys asps, but containsGlobalSections,ps::NamedTuple: the neural network parameters,dx::NamedTuple: the gradients stores as a NamedTuple.
All the arguments are given as NamedTuples as the neural network weights are stores in that format.
using GeometricOptimizers
using GeometricOptimizers: MomentumCache, Momentum, apply_toNT, geodesic, optimization_step!
ps = (weight = rand(StiefelManifold{Float32}, 5, 3), )
cache = apply_toNT(MomentumCache, ps)
o = Optimizer(Momentum(), cache, 0, geodesic)
λY = GlobalSection(ps)
dx = (weight = rand(Float32, 5, 3), )
# call the optimizer
optimization_step!(o, λY, ps, dx)
_test_nt(x) = typeof(x) <: NamedTuple
_test_nt(λY) & _test_nt(ps) & _test_nt(cache) & _test_nt(dx)
# output
trueExtended help
The derivatives dx here are usually obtained via an AD routine by differentiating a loss function, i.e. dx is $\nabla_xL$.
GeometricOptimizers.rgrad — Method
rgrad(Y::GrassmannManifold, ∇L::AbstractMatrix)Compute the Riemannian gradient for the Grassmann manifold at Y based on ∇L.
Here $Y$ is a representation of $\mathrm{span}(Y)\in{}Gr(n, N)$ and $\nabla{}L\in\mathbb{R}^{N\times{}n}$ is the Euclidean gradient.
This gradient has the property that it is orthogonal to the space spanned by $Y$.
The precise form of the mapping is:
\[\mathtt{rgrad}(Y, \nabla{}L) \mapsto \nabla{}L - YY^T\nabla{}L.\]
Note the property $Y^T\mathrm{rgrad}(Y, \nabla{}L) = \mathbb{O}.$
Also see rgrad(::StiefelManifold, ::AbstractMatrix).
Examples
using GeometricOptimizers
Y = GrassmannManifold([1 0 ; 0 1 ; 0 0; 0 0])
Δ = [1 2; 3 4; 5 6; 7 8]
rgrad(Y, Δ)
# output
4×2 Matrix{Int64}:
0 0
0 0
5 6
7 8GeometricOptimizers.rgrad — Method
rgrad(Y::StiefelManifold, ∇L::AbstractMatrix)Compute the Riemannian gradient for the Stiefel manifold at Y based on ∇L.
Here $Y\in{}St(N,n)$ and $\nabla{}L\in\mathbb{R}^{N\times{}n}$ is the Euclidean gradient.
The function computes the Riemannian gradient with respect to the canonical metric: metric(::StiefelManifold, ::AbstractMatrix, ::AbstractMatrix).
The precise form of the mapping is:
\[\mathtt{rgrad}(Y, \nabla{}L) \mapsto \nabla{}L - Y(\nabla{}L)^TY\]
Note the property $Y^T\mathtt{rgrad}(Y, \nabla{}L)\in\mathcal{S}_\mathrm{skew}(n).$
Examples
using GeometricOptimizers
Y = StiefelManifold([1 0 ; 0 1 ; 0 0; 0 0])
Δ = [1 2; 3 4; 5 6; 7 8]
rgrad(Y, Δ)
# output
4×2 Matrix{Int64}:
0 -1
1 0
5 6
7 8GeometricOptimizers.rhs — Method
rhs(cache)Return the right hand side of an instance of BFGSCache
GeometricOptimizers.rhs — Method
rhs(cache)Return the right hand side of an instance of DFPCache
GeometricOptimizers.rhs — Method
rhs(cache)Return the right hand side of an instance of NewtonOptimizerCache
GeometricOptimizers.solver_step! — Method
solver_step!(x, state, opt)Compute a full iterate for an EuclideanOptimizer.
Examples
julia> f(x) = sum(x .^ 2 + x .^ 3 / 3);
julia> x = [1f0, 2f0]
2-element Vector{Float32}:
1.0
2.0
julia> opt = EuclideanOptimizer(x, f; algorithm = Newton());
julia> state = NewtonOptimizerState(x);
julia> update!(state, gradient(opt), x);
julia> solver_step!(x, state, opt)
2-element Vector{Float32}:
0.25
0.6666666GeometricOptimizers.value — Method
value(obj::AbstractOptimizerProblem, x)Evaluates the value at x (i.e. computes obj.F(x)).
GeometricOptimizers.Ω — Method
Ω(Y::GrassmannManifold{T}, Δ::AbstractMatrix{T}) where TPerform the canonical horizontal lift for the Grassmann manifold:
\[ \Delta \mapsto \Omega^{St}(\Delta),\]
where $\Omega^{St}$ is the canonical horizontal lift for the Stiefel manifold.
using GeometricOptimizers
using GeometricOptimizers: StiefelProjection
E = GrassmannManifold(StiefelProjection(5, 2))
Δ = [0. 0.; 0. 0.; 2. 3.; 4. 5.; 6. 7.]
GeometricOptimizers.Ω(E, Δ)
# output
5×5 SkewSymMatrix{Float64, Vector{Float64}}:
0.0 -0.0 -2.0 -4.0 -6.0
0.0 0.0 -3.0 -5.0 -7.0
2.0 3.0 0.0 -0.0 -0.0
4.0 5.0 0.0 0.0 -0.0
6.0 7.0 0.0 0.0 0.0GeometricOptimizers.Ω — Method
Ω(Y::StiefelManifold{T}, Δ::AbstractMatrix{T}) where TPerform canonical horizontal lift for the Stiefel manifold:
\[ \Delta \mapsto (\mathbb{I} - \frac{1}{2}YY^T)\Delta{}Y^T - Y\Delta^T(\mathbb{I} - \frac{1}{2}YY^T).\]
Internally this performs
SkewSymMatrix(2 * (I(n) - .5 * Y * Y') * Δ * Y')It uses SkewSymMatrix to save memory.
Examples
using GeometricOptimizers
using GeometricOptimizers: StiefelProjection
E = StiefelManifold(StiefelProjection(5, 2))
Δ = [0. -1.; 1. 0.; 2. 3.; 4. 5.; 6. 7.]
GeometricOptimizers.Ω(E, Δ)
# output
5×5 SkewSymMatrix{Float64, Vector{Float64}}:
0.0 -1.0 -2.0 -4.0 -6.0
1.0 0.0 -3.0 -5.0 -7.0
2.0 3.0 0.0 -0.0 -0.0
4.0 5.0 0.0 0.0 -0.0
6.0 7.0 0.0 0.0 0.0Note that the output of Ω is a skew-symmetric matrix, i.e. an element of $\mathfrak{g}$.
GeometricOptimizers.𝔄 — Method
𝔄(B̂, B̄)Compute $\mathfrak{A}(B', B'') := \sum_{n=1}^\infty \frac{1}{n!} ((B'')^TB')^{n-1}.$
This expression has the property $\mathbb{I} + B'\mathfrak{A}(B', B'')(B'')^T = \exp(B'(B'')^T).$
Examples
using GeometricOptimizers
using GeometricOptimizers: 𝔄
import Random
Random.seed!(123)
B = rand(StiefelLieAlgHorMatrix, 10, 2)
B̂ = hcat(vcat(.5 * B.A, B.B), vcat(one(B.A), zero(B.B)))
B̄ = hcat(vcat(one(B.A), zero(B.B)), vcat(-.5 * B.A, -B.B))
one(B̂ * B̄') + B̂ * 𝔄(B̂, B̄) * B̄' ≈ exp(Matrix(B))
# output
trueGeometricOptimizers.𝔄 — Method
𝔄(A)Compute $\mathfrak{A}(A) := \sum_{n=1}^\infty \frac{1}{n!} (A)^{n-1}.$
Implementation
This uses a Taylor expansion that iteratively adds terms with
while norm(Aⁿ) > ε
mul!(A_temp, Aⁿ, A)
Aⁿ .= A_temp
rmul!(Aⁿ, T(inv(n)))
𝔄A += Aⁿ
n += 1
enduntil the norm of Aⁿ becomes smaller than machine precision. The counter n in the above algorithm is initialized as 2 The matrices Aⁿ and 𝔄 are initialized as the identity matrix.
SimpleSolvers.direction — Method
direction(cache)Return the direction of the gradient step (i.e. Δx) of an instance of BFGSCache.
SimpleSolvers.direction — Method
direction(cache)Return the direction of the gradient step (i.e. Δx) of an instance of DFPCache.
SimpleSolvers.direction — Method
direction(cache)Return the direction of the gradient step (i.e. Δx) of an instance of NewtonOptimizerCache.
SimpleSolvers.linesearch_problem — Method
linesearch_problem(problem, cache)Create SimpleSolvers.LinesearchProblem for linesearch algorithm.
The variable on which this problem depends is $\alpha$.
Example
julia> x = [1, 0., 0.]
3-element Vector{Float64}:
1.0
0.0
0.0
julia> f = x -> sum(x .^ 3 / 6 + x .^ 2 / 2);
julia> obj = OptimizerProblem(f, x);
julia> grad = GradientAutodiff{Float64}(obj.F, length(x));
julia> hess = HessianAutodiff{Float64}(obj.F, length(x));
julia> cache = NewtonOptimizerCache(x);
julia> state = NewtonOptimizerState(x); update!(state, grad, x);
julia> params = (x = state.x,);
julia> update!(cache, state, grad, hess, x);
julia> ls_obj = linesearch_problem(obj, grad, cache);
julia> ls_obj.F(0., params)
0.6666666666666666
julia> ls_obj.D(0., params)
-1.125
Note that in the example above calling update! on the NewtonOptimizerCache requires a SimpleSolvers.Hessian.
SimpleSolvers.solve! — Method
solve!(x, state, opt)Solve the optimization problem described by opt::EuclideanOptimizer and store the result in x.
Examples
julia> f(x) = sum(x .^ 2 + x .^ 3 / 3);
julia> x = [1f0, 2f0]
2-element Vector{Float32}:
1.0
2.0
julia> opt = EuclideanOptimizer(x, f; algorithm = Newton());
julia> state = NewtonOptimizerState(x);
julia> solve!(x, state, opt)
GeometricOptimizers.OptimizerResult{Float32, Float32, Vector{Float32}, GeometricOptimizers.OptimizerStatus{Float32, Float32}}( * Convergence measures
|x - x'| = 7.82e-03
|x - x'|/|x'| = 2.56e+02
|f(x) - f(x')| = 6.18e-05
|f(x) - f(x')|/|f(x')| = 6.63e+04
|g(x) - g(x')| = 1.57e-02
|g(x)| = 6.10e-05
, Float32[4.6478817f-8, 3.0517578f-5], 9.313341f-10)
julia> x
2-element Vector{Float32}:
4.6478817f-8
3.0517578f-5
julia> iteration_number(state)
4Also see solver_step!.