Global Tangent Spaces
In GeometricMachineLearning
standard neural network optimizers are generalized to homogeneous spaces by leveraging the special structure of the tangent spaces of this class of manifolds. When we introduced homogeneous spaces we already talked about that every tangent space to a homogeneous space $T_Y\mathcal{M}$ is of the form:
\[ T_Y\mathcal{M} = \mathfrak{g} \cdot Y := \{AY: A\in{}\mathfrak{g}\}.\]
We then have a decomposition of $\mathfrak{g}$ into a vertical part $\mathfrak{g}^{\mathrm{ver}, Y}$ and a horizontal part $\mathfrak{g}^{\mathrm{hor}, Y}$ and the horizontal part is isomorphic to $T_Y\mathcal{M}$ via:
\[ \mathfrak{g}^{\mathrm{hor}, Y} = \{\Omega(\Delta): \Delta\in{}T_Y\mathcal{M} \}.\]
We now identify a special element $E \in \mathcal{M}$ and designate the horizontal component $\mathfrak{g}^{\mathrm{hor}, E}$ as our global tangent space. We will refer to this global tangent space by $\mathfrak{g}^\mathrm{hor}$. We can now find a transformation from any $\mathfrak{g}^{\mathrm{hor}, Y}$ to $\mathfrak{g}^\mathrm{hor}$ and vice-versa (these spaces are isomorphic).
Let $A\in{}G$ an element such that $AE = Y$. Then we have
\[A^{-1}\cdot\mathfrak{g}^{\mathrm{hor},Y}\cdot{}A = \mathfrak{g}^\mathrm{hor},\]
i.e. for every element $B\in\mathfrak{g}^\mathrm{hor}$ we can find a $B^Y \in \mathfrak{g}^{\mathrm{hor},Y}$ s.t. $B = A^{-1}B^YA$ (and vice-versa).
Proof
We first show that for every $B^Y\in\mathfrak{g}^{\mathrm{hor},Y}$ the element $A^{-1}B^YA$ is in $\mathfrak{g}^{\mathrm{hor}}$. First note that $A^{-1}B^YA\in\mathfrak{g}$ by a fundamental theorem of Lie group theory (closedness of the Lie algebra under adjoint action). Now assume that $A^{-1}B^YA$ is not fully contained in $\mathfrak{g}^\mathrm{hor}$, i.e. it also has a vertical component. So we would lose information when performing $A^{-1}B^YA \mapsto A^{-1}B^YAE = A^{-1}B^YY$, but this contradicts the fact that $B^Y\in\mathfrak{g}^{\mathrm{hor},Y}.$ We now have to proof that for every $B\in\mathfrak{g}^\mathrm{hor}$ we can find an element in $\mathfrak{g}^{\mathrm{hor}, Y}$ such that this element is mapped to $B$. By a argument similar to the one above we can show that $ABA^{-1}\in\mathfrak{g}^\mathrm{hor, Y}$ and this element maps to $B$. Proofing that the map is injective is now trivial.
We should note that we have written all Lie group and Lie algebra actions as simple matrix multiplications, like $AE = Y$. For some Lie groups and Lie algebras, as the Lie group of isomorphisms on some domain $\mathcal{D}$, this notation may not be appropriate [21]. These Lie groups are however not relevant for what we use in GeometricMachineLearning
and we will stick to regular matrix notation.
Global Sections
Note that the theorem above requires us to find an element $A\in{}G$ such that $AE = Y$. We will call such a mapping $\lambda:\mathcal{M}\to{}G$ a global section[1].
We call a mapping $\lambda$ from a homogeneous space $\mathcal{M}$ to its associated Lie group $G$ a global section if $\forall{}Y\in\mathcal{M}$ it satisfies:
\[\lambda(Y)E = Y,\]
where $E$ is the distinct element of the homogeneous space.
Note that in general global sections are not unique because the rank of $G$ is in general greater than that of $\mathcal{M}$. We give an example of how to construct such a global section for the Stiefel and the Grassmann manifolds below.
The Global Tangent Space for the Stiefel Manifold
We now discuss the specific form of the global tangent space for the Stiefel manifold. We pick as distinct element $E$ (which build by calling StiefelProjection
):
\[E = \begin{bmatrix} \mathbb{I}_n \\ \mathbb{O} \end{bmatrix}\in{}St(n, N).\]
Based on this, elements of the vector space $\mathfrak{g}^{\mathrm{hor}, E} =: \mathfrak{g}^{\mathrm{hor}}$ are:
\[\bar{B} = \begin{pmatrix} A & B^T \\ B & \mathbb{O} \end{pmatrix},\]
where $A$ is a skew-symmetric matrix of size $n\times{}n$ and $B$ is an arbitrary matrix of size $(N - n)\times{}n$. Arrays of type $\mathfrak{g}^{\mathrm{hor}, E} \equiv \mathfrak{g}^\mathrm{hor}$ are implemented in GeometricMachineLearning
under the name StiefelLieAlgHorMatrix
.
We can call this with e.g. a skew-symmetric matrix $A$ and an arbitrary matrix $B$:
N, n = 5, 2
A = rand(SkewSymMatrix, n)
2×2 SkewSymMatrix{Float64, Vector{Float64}}:
0.0 -0.521059
0.521059 0.0
B = rand(N - n, n)
3×2 Matrix{Float64}:
0.599737 0.413005
0.00326307 0.763306
0.475287 0.724234
The constructor is then called as follows:
B̄ = StiefelLieAlgHorMatrix(A, B, N, n)
5×5 StiefelLieAlgHorMatrix{Float64, SkewSymMatrix{Float64, Vector{Float64}}, Matrix{Float64}}:
0.0 -0.521059 -0.599737 -0.00326307 -0.475287
0.521059 0.0 -0.413005 -0.763306 -0.724234
0.599737 0.413005 0.0 0.0 0.0
0.00326307 0.763306 0.0 0.0 0.0
0.475287 0.724234 0.0 0.0 0.0
We can also call it with a matrix of shape $N\times{}N$:
B̄₂ = Matrix(B̄) # note that this does not have any special structure
StiefelLieAlgHorMatrix(B̄₂, n)
5×5 StiefelLieAlgHorMatrix{Float64, SkewSymMatrix{Float64, Vector{Float64}}, SubArray{Float64, 2, Matrix{Float64}, Tuple{UnitRange{Int64}, UnitRange{Int64}}, false}}:
0.0 -0.521059 -0.599737 -0.00326307 -0.475287
0.521059 0.0 -0.413005 -0.763306 -0.724234
0.599737 0.413005 0.0 0.0 0.0
0.00326307 0.763306 0.0 0.0 0.0
0.475287 0.724234 0.0 0.0 0.0
Or we can call it on $T_E\mathcal{M}\subset\mathbb{R}^{N\times{}n},$ i.e. a matrix of shape $N\times{}n$:
E = StiefelProjection(N, n)
5×2 StiefelProjection{Float64, Matrix{Float64}}:
1.0 0.0
0.0 1.0
0.0 0.0
0.0 0.0
0.0 0.0
B̄₃ = B̄ * E
StiefelLieAlgHorMatrix(B̄₃, n)
5×5 StiefelLieAlgHorMatrix{Float64, SkewSymMatrix{Float64, Vector{Float64}}, SubArray{Float64, 2, Matrix{Float64}, Tuple{UnitRange{Int64}, UnitRange{Int64}}, false}}:
0.0 -0.521059 -0.599737 -0.00326307 -0.475287
0.521059 0.0 -0.413005 -0.763306 -0.724234
0.599737 0.413005 0.0 0.0 0.0
0.00326307 0.763306 0.0 0.0 0.0
0.475287 0.724234 0.0 0.0 0.0
We now demonstrate how to map from an element of $\mathfrak{g}^{\mathrm{hor}, Y}$ to an element of $\mathfrak{g}^\mathrm{hor}$:
using GeometricMachineLearning: Ω
Y = rand(StiefelManifold, N, n)
Δ = rgrad(Y, rand(N, n))
ΩΔ = Ω(Y, Δ)
λY = GlobalSection(Y)
λY_mat = Matrix(λY)
round.(λY_mat' * ΩΔ * λY_mat; digits = 3)
5×5 Matrix{Float64}:
-0.0 0.175 0.527 -0.746 -0.007
-0.175 0.0 0.72 -0.442 0.834
-0.527 -0.72 0.0 -0.0 0.0
0.746 0.442 0.0 -0.0 -0.0
0.007 -0.834 -0.0 -0.0 -0.0
Performing this computation directly is computationally very inefficient however and the user is strongly discouraged to call Matrix
on an instance of GlobalSection
. The better option is calling global_rep
:
_round(global_rep(λY, Δ); digits = 3)
5×5 StiefelLieAlgHorMatrix{Float64, SkewSymMatrix{Float64, Vector{Float64}}, Matrix{Float64}}:
0.0 0.175 0.527 -0.746 -0.007
-0.175 0.0 0.72 -0.442 0.834
-0.527 -0.72 0.0 0.0 0.0
0.746 0.442 0.0 0.0 0.0
0.007 -0.834 0.0 0.0 0.0
Internally GlobalSection
calls the function GeometricMachineLearning.global_section
which does the following for the Stiefel manifold:
A = randn(N, N - n) # or the gpu equivalent
A = A - Y * (Y' * A)
Y⟂ = qr(A).Q[1:N, 1:(N - n)]
So we draw $(N - n)$ new columns randomly, subtract the part that is spanned by the columns of $Y$ and then perform a $QR$ composition on the resulting matrix. The $Q$ part of the decomposition is a matrix of $(N - n)$ columns that is orthogonal to $Y$ and is typically referred to as $Y_\perp$ [20, 22, 23]. We can easily check that this $Y_\perp$ is indeed orthogonal to $Y$.
The matrix $Y_\perp$ constructed with the algorithm above satisfies
\[Y^TY_\perp = \mathbb{O}_{n\times{}n},\]
and
\[(Y_\perp)^TY_\perp = \mathbb{I}_n,\]
i.e. all the columns in the big matrix $[Y, Y_\perp]\in\mathbb{R}^{N\times{}N}$ are mutually orthonormal and it therefore is an element of $SO(N)$.
Proof
The second property is trivially satisfied because the $Q$ component of a $QR$ decomposition is an orthogonal matrix. For the first property note that $Y^TQR = \mathbb{O}$ is zero because we have subtracted the $Y$ component from the matrix $QR$. The matrix $R\in\mathbb{R}^{N\times{}(N-n)}$ further has the property $[R]_{ij} = 0$ for $i > j$ and we have that
\[(Y^TQ)R = [r_{11}(Y^TQ)_{1\bullet}, r_{12}(Y^TQ)_{1\bullet} + r_{22}(Y^TQ)_{2\bullet}, \ldots, \sum_{i=1}^{N-n}r_{i(N-n)}(Y^TQ)_{i\bullet}].\]
Now all the coefficients $r_{ii}$ are non-zero because the matrix we performed the $QR$ decomposition on has full rank and we can see that if $(Y^TQ)R$ is zero $Y^TQ$ also has to be zero.
The function global_rep
furthermore makes use of the following:
\[ \mathtt{global\_rep}(Y) = \lambda(Y)^T\Omega(Y,\Delta)\lambda(Y) = EY^T\Delta{}E^T + \begin{bmatrix} \mathbb{O} \\ \bar{\lambda}^T\Delta{}E^T \end{bmatrix} - \begin{bmatrix} \mathbb{O} & E\Delta^T\bar{\lambda} \end{bmatrix},\]
where $\lambda(Y) = [Y, \bar{\lambda}].$
Proof
We derive the expression above:
\[\begin{aligned} \lambda(Y)^T\Omega(Y,\Delta)\lambda(Y) & = \lambda(Y)^T[(\mathbb{I} - \frac{1}{2}YY^T)\Delta{}Y^T - Y\Delta^T(\mathbb{I} - \frac{1}{2}YY^T)]\lambda(Y) \\ & = \lambda(Y)^T[(\mathbb{I} - \frac{1}{2}YY^T)\Delta{}E^T - Y\Delta^T(\lambda(Y) - \frac{1}{2}YE^T)] \\ & = \lambda(Y)^T\Delta{}E^T - \frac{1}{2}EY^T\Delta{}E^T - E\Delta^T\lambda(Y) + \frac{1}{2}E\Delta^TYE^T \\ & = \begin{bmatrix} Y^T\Delta{}E^T \\ \bar{\lambda}\Delta{}E^T \end{bmatrix} - \frac{1}{2}EY^T\Delta{}E - \begin{bmatrix} E\Delta^TY & E\Delta^T\bar{\lambda} \end{bmatrix} + \frac{1}{2}E\Delta^TYE^T \\ & = \begin{bmatrix} Y^T\Delta{}E^T \\ \bar{\lambda}\Delta{}E^T \end{bmatrix} + E\Delta^TYE^T - \begin{bmatrix}E\Delta^TY & E\Delta^T\bar{\lambda} \end{bmatrix} \\ & = EY^T\Delta{}E^T + E\Delta^TYE^T - E\Delta^TYE^T + \begin{bmatrix} \mathbb{O} \\ \bar{\lambda}\Delta{}E^T \end{bmatrix} - \begin{bmatrix} \mathbb{O} & E\Delta^T\bar{\lambda} \end{bmatrix} \\ & = EY^T\Delta{}E^T + \begin{bmatrix} \mathbb{O}_{n\times{}N} \\ \bar{\lambda}\Delta{}E^T \end{bmatrix} - \begin{bmatrix} \mathbb{O}_{N\times{}n} & E\Delta^T\bar{\lambda} \end{bmatrix}, \end{aligned}\]
which proofs our assertion.
This expression of global_rep
means we only need $Y^T\Delta$ and $\bar{\lambda}^T\Delta$ and this is what is used internally.
We now discuss the global tangent space for the Grassmann manifold. This is similar to the Stiefel case.
Global Tangent Space for the Grassmann Manifold
In the case of the Grassmann manifold we construct the global tangent space with respect to the distinct element $\mathcal{E}=\mathrm{span}(E)\in{}Gr(n,N)$, where $E$ is again the same matrix.
The tangent tangent space $T_\mathcal{E}Gr(n,N)$ can be represented through matrices:
\[\begin{pmatrix} 0 & \cdots & 0 \\ \cdots & \cdots & \cdots \\ 0 & \cdots & 0 \\ b_{11} & \cdots & b_{1n} \\ \cdots & \cdots & \cdots \\ b_{(N-n)1} & \cdots & b_{(N-n)n} \end{pmatrix}.\]
This representation is based on the identification $T_\mathcal{E}Gr(n,N)\to{}T_E\mathcal{S}_E$ that was discussed in the section on the Grassmann manifold[2]. We use the following notation:
\[\mathfrak{g}^\mathrm{hor} = \mathfrak{g}^{\mathrm{hor},\mathcal{E}} = \left\{\begin{pmatrix} 0 & -B^T \\ B & 0 \end{pmatrix}: \text{$B\in\mathbb{R}^{(N-n)\times{}n}$ is arbitrary}\right\}.\]
This is equivalent to the horizontal component of $\mathfrak{g}$ for the Stiefel manifold for the case when $A$ is zero. This is a reflection of the rotational invariance of the Grassmann manifold: the skew-symmetric matrices $A$ are connected to the group of rotations $O(n)$ which is factored out in the Grassmann manifold $Gr(n,N)\simeq{}St(n,N)/O(n)$. In GeometricMachineLearning
we thus treat the Grassmann manifold as being embedded in the Stiefel manifold. In [23] viewing the Grassmann manifold as a quotient space of the Stiefel manifold is important for "feasibility" in "practical computations".
Library Functions
GeometricMachineLearning.AbstractLieAlgHorMatrix
— TypeAbstractLieAlgHorMatrix <: AbstractMatrix
AbstractLieAlgHorMatrix
is a supertype for various horizontal components of Lie algebras. We usually call this $\mathfrak{g}^\mathrm{hor}$.
See StiefelLieAlgHorMatrix
and GrassmannLieAlgHorMatrix
for concrete examples.
GeometricMachineLearning.StiefelLieAlgHorMatrix
— TypeStiefelLieAlgHorMatrix(A::SkewSymMatrix, B::AbstractMatrix, N::Integer, n::Integer)
Build an instance of StiefelLieAlgHorMatrix
based on a skew-symmetric matrix A
and an arbitrary matrix B
.
An element of StiefelLieAlgMatrix takes the form:
\[\begin{pmatrix} A & B^T \\ B & \mathbb{O} \end{pmatrix},\]
where $A$ is skew-symmetric (this is SkewSymMatrix
in GeometricMachineLearning
).
Also see GrassmannLieAlgHorMatrix
.
Extended help
StiefelLieAlgHorMatrix
is the horizontal component of the Lie algebra of skew-symmetric matrices (with respect to the canonical metric).
The projection here is: $\pi:S \to SE$ where
\[E = \begin{bmatrix} \mathbb{I}_{n} \\ \mathbb{O}_{(N-n)\times{}n} \end{bmatrix}.\]
The matrix $E$ is implemented under StiefelProjection
in GeometricMachineLearning
.
GeometricMachineLearning.StiefelLieAlgHorMatrix
— MethodStiefelLieAlgHorMatrix(D::AbstractMatrix, n::Integer)
Take a big matrix as input and build an instance of StiefelLieAlgHorMatrix
.
The integer $N$ in $St(n, N)$ is the number of rows of D
.
Extended help
If the constructor is called with a big $N\times{}N$ matrix, then the projection is performed the following way:
\[\begin{pmatrix} A & B_1 \\ B_2 & D \end{pmatrix} \mapsto \begin{pmatrix} \mathrm{skew}(A) & -B_2^T \\ B_2 & \mathbb{O} \end{pmatrix}.\]
The operation $\mathrm{skew}:\mathbb{R}^{n\times{}n}\to\mathcal{S}_\mathrm{skew}(n)$ is the skew-symmetrization operation. This is equivalent to calling of SkewSymMatrix
with an $n\times{}n$ matrix.
This can also be seen as the operation:
\[D \mapsto \Omega(E, DE) = \mathrm{skew}\left(2 \left(\mathbb{I} - \frac{1}{2} E E^T \right) DE E^T\right).\]
Also see GeometricMachineLearning.Ω
.
GeometricMachineLearning.GrassmannLieAlgHorMatrix
— TypeGrassmannLieAlgHorMatrix(B::AbstractMatrix, N::Integer, n::Integer)
Build an instance of GrassmannLieAlgHorMatrix
based on an arbitrary matrix B
of size $(N-n)\times{}n$.
GrassmannLieAlgHorMatrix
is the horizontal component of the Lie algebra of skew-symmetric matrices (with respect to the canonical metric).
Extended help
The projection here is: $\pi:S \to SE/\sim$ where
\[E = \begin{bmatrix} \mathbb{I}_{n} \\ \mathbb{O}_{(N-n)\times{}n} \end{bmatrix},\]
and the equivalence relation is
\[V_1 \sim V_2 \iff \exists A\in\mathcal{S}_\mathrm{skew}(n) \text{ such that } V_2 = V_1 + \begin{bmatrix} A \\ \mathbb{O} \end{bmatrix}\]
An element of GrassmannLieAlgMatrix takes the form:
\[\begin{pmatrix} \bar{\mathbb{O}} & B^T \\ B & \mathbb{O} \end{pmatrix},\]
where $\bar{\mathbb{O}}\in\mathbb{R}^{n\times{}n}$ and $\mathbb{O}\in\mathbb{R}^{(N - n)\times(N-n)}.$
GeometricMachineLearning.GrassmannLieAlgHorMatrix
— MethodGrassmannLieAlgHorMatrix(D::AbstractMatrix, n::Integer)
Take a big matrix as input and build an instance of GrassmannLieAlgHorMatrix
.
The integer $N$ in $Gr(n, N)$ here is the number of rows of D
.
Extended help
If the constructor is called with a big $N\times{}N$ matrix, then the projection is performed the following way:
\[\begin{pmatrix} A & B_1 \\ B_2 & D \end{pmatrix} \mapsto \begin{pmatrix} \bar{\mathbb{O}} & -B_2^T \\ B_2 & \mathbb{O} \end{pmatrix}.\]
This can also be seen as the operation:
\[D \mapsto \Omega(E, DE - EE^TDE),\]
where $\Omega$ is the horizontal lift GeometricMachineLearning.Ω
.
GeometricMachineLearning.GlobalSection
— TypeGlobalSection(Y)
Construct a global section for Y
.
A global section $\lambda$ is a mapping from a homogeneous space $\mathcal{M}$ to the corresponding Lie group $G$ such that
\[\lambda(Y)E = Y,\]
Also see apply_section
and global_rep
.
Implementation
For an implementation of GlobalSection
for a custom array (especially manifolds), the function global_section
has to be generalized.
Base.Matrix
— MethodMatrix(λY::GlobalSection)
Put λY
into matrix form.
This is not recommended if speed is important!
Use apply_section
and global_rep
instead!
GeometricMachineLearning.apply_section
— Functionapply_section(λY::GlobalSection{T, AT}, Y₂::AT) where {T, AT <: StiefelManifold{T}}
Apply λY
to Y₂
.
Mathematically this is the group action of the element $\lambda{}Y\in{}G$ on the element $Y_2$ of the homogeneous space $\mathcal{M}$.
Internally it calls apply_section!
.
GeometricMachineLearning.apply_section!
— Functionapply_section!(Y::AT, λY::GlobalSection{T, AT}, Y₂::AT) where {T, AT<:StiefelManifold{T}}
Apply λY
to Y₂
and store the result in Y
.
This is the inplace version of apply_section
.
Base.:*
— MethodλY * Y
Apply the element λY
onto Y
.
Here λY
is an element of a Lie group and Y
is an element of a homogeneous space.
GeometricMachineLearning.global_section
— Methodglobal_section(Y::StiefelManifold)
Compute a matrix of size $N\times(N-n)$ whose columns are orthogonal to the columns in Y
.
This matrix is also called $Y_\perp$ [20, 22, 23].
Examples
using GeometricMachineLearning
using GeometricMachineLearning: global_section
import Random
Random.seed!(123)
Y = StiefelManifold([1. 0.; 0. 1.; 0. 0.; 0. 0.])
round.(Matrix(global_section(Y)); digits = 3)
# output
4×2 Matrix{Float64}:
0.0 -0.0
0.0 0.0
0.936 -0.353
0.353 0.936
Further note that we convert the QRCompactWYQ
object to a Matrix
before we display it.
Implementation
The implementation is done with a QR decomposition (LinearAlgebra.qr!
). Internally we do:
A = randn(N, N - n) # or the gpu equivalent
A = A - Y.A * (Y.A' * A)
qr!(A).Q
global_section(Y::GrassmannManifold)
Compute a matrix of size $N\times(N-n)$ whose columns are orthogonal to the columns in Y
.
The method global_section
for the Grassmann manifold is equivalent to that for the StiefelManifold
(we represent the Grassmann manifold as an embedding in the Stiefel manifold).
See the documentation for global_section(Y::StiefelManifold{T}) where T
.
GeometricMachineLearning.global_section
— Methodglobal_section(Y::StiefelManifold)
Compute a matrix of size $N\times(N-n)$ whose columns are orthogonal to the columns in Y
.
This matrix is also called $Y_\perp$ [20, 22, 23].
Examples
using GeometricMachineLearning
using GeometricMachineLearning: global_section
import Random
Random.seed!(123)
Y = StiefelManifold([1. 0.; 0. 1.; 0. 0.; 0. 0.])
round.(Matrix(global_section(Y)); digits = 3)
# output
4×2 Matrix{Float64}:
0.0 -0.0
0.0 0.0
0.936 -0.353
0.353 0.936
Further note that we convert the QRCompactWYQ
object to a Matrix
before we display it.
Implementation
The implementation is done with a QR decomposition (LinearAlgebra.qr!
). Internally we do:
A = randn(N, N - n) # or the gpu equivalent
A = A - Y.A * (Y.A' * A)
qr!(A).Q
global_section(Y::GrassmannManifold)
Compute a matrix of size $N\times(N-n)$ whose columns are orthogonal to the columns in Y
.
The method global_section
for the Grassmann manifold is equivalent to that for the StiefelManifold
(we represent the Grassmann manifold as an embedding in the Stiefel manifold).
See the documentation for global_section(Y::StiefelManifold{T}) where T
.
GeometricMachineLearning.global_rep
— Functionglobal_rep(λY::GlobalSection{T, AT}, Δ::AbstractMatrix{T}) where {T, AT<:StiefelManifold{T}}
Express Δ
(an the tangent space of Y
) as an instance of StiefelLieAlgHorMatrix
.
This maps an element from $T_Y\mathcal{M}$ to an element of $\mathfrak{g}^\mathrm{hor}$.
These two spaces are isomorphic where the isomorphism where the isomorphism is established through $\lambda(Y)\in{}G$ via:
\[T_Y\mathcal{M} \to \mathfrak{g}^{\mathrm{hor}}, \Delta \mapsto \lambda(Y)^{-1}\Omega(Y, \Delta)\lambda(Y).\]
Also see GeometricMachineLearning.Ω
.
Examples
using GeometricMachineLearning
using GeometricMachineLearning: _round
import Random
Random.seed!(123)
Y = rand(StiefelManifold, 6, 3)
Δ = rgrad(Y, randn(6, 3))
λY = GlobalSection(Y)
_round(global_rep(λY, Δ); digits = 3)
# output
6×6 StiefelLieAlgHorMatrix{Float64, SkewSymMatrix{Float64, Vector{Float64}}, Matrix{Float64}}:
0.0 0.679 1.925 0.981 -2.058 0.4
-0.679 0.0 0.298 -0.424 0.733 -0.919
-1.925 -0.298 0.0 -1.815 1.409 1.085
-0.981 0.424 1.815 0.0 0.0 0.0
2.058 -0.733 -1.409 0.0 0.0 0.0
-0.4 0.919 -1.085 0.0 0.0 0.0
Implementation
The function global_rep
does in fact not perform the entire map $\lambda(Y)^{-1}\Omega(Y, \Delta)\lambda(Y)$ but only
\[\Delta \mapsto \mathrm{skew}(Y^T\Delta),\]
to get the small skew-symmetric matrix $A\in\mathcal{S}_\mathrm{skew}(n)$ and
\[\Delta \mapsto (\lambda(Y)_{[1:N, n:N]}^T \Delta)_{[1:(N-n), 1:n]},\]
to get the arbitrary matrix $B\in\mathbb{R}^{(N-n)\times{}n}$.
global_rep(λY::GlobalSection{T, AT}, Δ::AbstractMatrix{T}) where {T, AT<:GrassmannManifold{T}}
Express Δ
(an element of the tangent space of Y
) as an instance of GrassmannLieAlgHorMatrix
.
The method global_rep
for GrassmannManifold
is similar to that for StiefelManifold
.
Examples
using GeometricMachineLearning
using GeometricMachineLearning: _round
import Random
Random.seed!(123)
Y = rand(GrassmannManifold, 6, 3)
Δ = rgrad(Y, randn(6, 3))
λY = GlobalSection(Y)
_round(global_rep(λY, Δ); digits = 3)
# output
6×6 GrassmannLieAlgHorMatrix{Float64, Matrix{Float64}}:
0.0 0.0 0.0 0.981 -2.058 0.4
0.0 0.0 0.0 -0.424 0.733 -0.919
0.0 0.0 0.0 -1.815 1.409 1.085
-0.981 0.424 1.815 0.0 0.0 0.0
2.058 -0.733 -1.409 0.0 0.0 0.0
-0.4 0.919 -1.085 0.0 0.0 0.0
References
- [20]
- P.-A. Absil, R. Mahony and R. Sepulchre. Riemannian geometry of Grassmann manifolds with a view on algorithmic computation. Acta Applicandae Mathematica 80, 199–220 (2004).
- [22]
- P.-A. Absil, R. Mahony and R. Sepulchre. Optimization algorithms on matrix manifolds (Princeton University Press, Princeton, New Jersey, 2008).
- [23]
- T. Bendokat, R. Zimmermann and P.-A. Absil. A Grassmann manifold handbook: Basic geometry and computational aspects, arXiv preprint arXiv:2011.13699 (2020).
- [7]
- B. Brantner. Generalizing Adam To Manifolds For Efficiently Training Transformers, arXiv preprint arXiv:2305.16901 (2023).
- [109]
- T. Frankel. The geometry of physics: an introduction (Cambridge university press, Cambridge, UK, 2011).
- 1Global sections are also crucial for parallel transport in
GeometricMachineLearning
. A global section is first updated, i.e. $\Lambda^{(t)} \gets \mathrm{update}(\Lambda^{(t-1)});$ and on the basis of this we then update the element of the manifold $Y\in\mathcal{M}$ and the tangent vector $\Delta\in{}T\mathcal{M}$. - 2We derived the following expression for the Riemannian gradient of the Grassmann manifold: $\mathrm{grad}_\mathcal{Y}^{Gr}L = \nabla_Y{}L - YY^T\nabla_YL$. The tangent space to the element $\mathcal{E}$ can thus be written as $\bar{B} - EE^T\bar{B}$ where $B\in\mathbb{R}^{N\times{}n}$ and the matrices in this tangent space have the desired form.