Homogeneous Spaces

Homogeneous spaces are very important in GeometricMachineLearning as we can generalize existing neural network optimizers from vector spaces to such homogenous spaces. They are intricately linked to the notion of a Lie Group and its Lie Algebra^[1].

Definition

A homogeneous space is a manifold $\mathcal{M}$ on which a Lie group $G$ acts transitively, i.e.

\[\forall X,Y\in\mathcal{M} \quad \exists{}A\in{}G\text{ s.t. }AX = Y.\]

Now fix a distinct element $E\in\mathcal{M}$; we will refer to this as the canonical element or StiefelProjection. We can also establish an isomorphism between $\mathcal{M}$ and the quotient space $G/\sim$ with the equivalence relation:

\[A_1 \sim A_2 \iff A_1E = A_2E.\]

Note that this is independent of the chosen $E$.

The tangent spaces of $\mathcal{M}$ are of the form $T_Y\mathcal{M} = \mathfrak{g}\cdot{}Y$, i.e. can be fully described through its Lie algebra. Based on this we can perform a splitting of $\mathfrak{g}$ into two parts:

Definition

A splitting of the Lie algebra $\mathfrak{g}$ at an element of a homogeneous space $Y$ is a decomposition into a vertical and a horizontal component, denoted by $\mathfrak{g} = \mathfrak{g}^{\mathrm{ver},Y} \oplus \mathfrak{g}^{\mathrm{hor},Y}$ such that

The vertical component $\mathfrak{g}^{\mathrm{ver},Y}$ is the kernel of the map $\mathfrak{g}\to{}T_Y\mathcal{M}, V \mapsto VY$, i.e. $\mathfrak{g}^{\mathrm{ver},Y} = \{V\in\mathfrak{g}:VY = 0\}.$
The horizontal component $\mathfrak{g}^{\mathrm{hor},Y}$ is the orthogonal complement of $\mathfrak{g}^{\mathrm{ver},Y}$ in $\mathfrak{g}$. It is isomorphic to $T_Y\mathcal{M}$.

Orthogonal complement means that $\forall{}V\in\mathfrak{g}^{\mathrm{ver}, Y}$ and $\forall{}B\in\mathfrak{g}^{\mathrm{hor}, Y}$ we have $\langle V, B \rangle = 0$ for some metric $\langle\cdot,\cdot\rangle$ defined on $\mathfrak{g}$.

We will refer to the isomorphism from $T_Y\mathcal{M}$ to $\mathfrak{g}^{\mathrm{hor}, Y}$ by $\Omega$. We will give explicit examples of $\Omega$ below. The metric $\langle\cdot,\cdot\rangle$ on $\mathfrak{g}$ further induces a Riemannian metric on $\mathcal{M}$:

\[g_Y(\Delta_1, \Delta_2) = \langle\Omega(Y,\Delta_1),\Omega(Y,\Delta_2)\rangle\text{ for $\Delta_1,\Delta_2\in{}T_Y\mathcal{M}$.}\]

Two examples of homogeneous spaces implemented in GeometricMachineLearning are the Stiefel manifold and the Grassmann manifold. The Lie group $SO(N)$ acts transitively on both of these manifolds, i.e. turns them into homogeneous spaces. We give its Lie algebra as an example here:

Example

The Lie algebra of $SO(N)$ are the skew-symmetric matrices $\mathfrak{so}(N):=\{V\in\mathbb{R}^{N\times{}N}:V^T + V = 0\}$ and the canonical metric associated with it is simply $(V_1,V_2)\mapsto\frac{1}{2}\mathrm{Tr}(V_1^TV_2)$.

The Stiefel Manifold

The Stiefel manifold $St(n, N)$ is the space of all orthonormal frames in $\mathbb{R}^{N\times{}n}$, i.e. matrices $Y\in\mathbb{R}^{N\times{}n}$ s.t. $Y^TY = \mathbb{I}_n$. It can also be seen as $SO(N)$ modulo an equivalence relation: $A\sim{}B\iff{}AE = BE$ for

\[E = \begin{bmatrix} \mathbb{I}_n \\ \mathbb{O} \end{bmatrix}\in{}St(n, N),\]

which is the canonical element of the Stiefel manifold that we call StiefelProjection. In words: the first $n$ columns of $A$ and $B$ are the same. We also use this principle to draw random elements from the Stiefel manifold.

Remark

Drawing random elements from the Stiefel (and the Grassmann) manifold is done by first calling rand(N, n) (i.e. drawing from a normal distribution) and then performing a $QR$ decomposition. We then take the first $n$ columns of the $Q$ matrix to be an element of the Stiefel manifold.

The tangent space to the element $Y\in{}St(n,N)$ can be determined by considering $C^\infty$ curves on $SO(N)$ through $\mathbb{I}.$ We write those curves as $t\mapsto{}A(t)$. Because $SO(N)$ acts transitively on $St(n, N)$ each $C^\infty$ curve on $St(n, N)$ through $Y$ can be written as $A(t)Y$ and we get:

\[T_YSt(n,N)=\{BY : B\in\mathfrak{g}\} = \{\Delta\in\mathbb{R}^{N\times{}n}: \Delta^TY + Y^T\Delta = \mathbb{O}\},\]

where the last equality^[2] can be established through the isomorphism:

\[\Omega: T_YSt(n, N) \to \mathfrak{g}^{\mathrm{hor}, Y}, \Delta \mapsto (\mathbb{I} - \frac{1}{2}YY^T)\Delta{}Y^T - Y\Delta^T(\mathbb{I} - \frac{1}{2}YY^T).\]

That this is an isomorphism can be easily checked:

\[ \Omega(\Delta)Y = (\mathbb{I} - \frac{1}{2}YY^T)\Delta - \frac{1}{2}Y\Delta^TY = \Delta.\]

This isomorphism is implemented in GeometricMachineLearning:

using GeometricMachineLearning: Ω
Y = rand(StiefelManifold, 5, 3)
Δ = rgrad(Y, rand(5, 3))
Ω(Y, Δ) * Y.A ≈ Δ

true

The function rgrad, which maps $\mathbb{R}^{N\times{}n}$ to $T_YSt(n, N)$ is introduced below. We can now also introduce the Riemannian metric on $St(n,N)$:

\[g_Y(\Delta_1, \Delta_2) = \mathrm{Tr}\left( \frac{1}{2} \Omega(\Delta_1)^T \Omega(\Delta_2) \right) = \mathrm{Tr}(\Delta_1^T(\mathbb{I} - \frac{1}{2}YY^T)\Delta_2).\]

We can check that this is true:

using LinearAlgebra: tr
Δ₂ = rgrad(Y, rand(5, 3))
.5 * tr(Ω(Y, Δ)' * Ω(Y, Δ₂)) ≈ metric(Y, Δ, Δ₂)

true

The Riemannian Gradient for the Stiefel Manifold

We defined the Riemannian gradient to be a vector field $\mathrm{grad}^gL$ such that it is compatible with the Riemannian metric in some sense; the definition we gave relied on an explicit coordinate chart. We can also express the Riemannian gradient for matrix manifolds by not relying on an explicit coordinate representation (which would be computationally expensive) [20].

Definition

Given a Riemannian matrix manifold $\mathcal{M}$ we define the Riemannian gradient of $L:\mathcal{M}\to\mathbb{R}$ at $Y$, called $\mathrm{grad}_YL\in{}T_Y\mathcal{M}$, as the unique element of $T_Y\mathcal{M}$ such that for any other $\Delta\in{}T_Y\mathcal{M}$ we have

\[\mathrm{Tr}((\nabla{}L)^T\Delta) = g_Y(\mathrm{grad}_YL, \Delta),\]

where Tr indicates the usual matrix trace.

For the Stiefel manifold the Riemannian gradient is given by:

\[ \mathrm{grad}_YL = \nabla_YL - Y(\nabla_YL)^TY =: \mathtt{rgrad}(Y, \nabla_YL),\]

where $\nabla_YL$ refers to the Euclidean gradient, i.e.

\[ [\nabla_YL]_{ij} = \frac{\partial{}L}{\partial{}y_{ij}}.\]

The Euclidean gradient $\nabla{}L$ can in practice be obtained with an AD routine. We then use the function rgrad to map $\nabla_YL$ from $\mathbb{R}^{N\times{}n}$ to $T_YSt(n,N)$. We can check that this mapping indeed produces the Riemannian gradient^[3]:

using LinearAlgebra: tr

Y = rand(StiefelManifold, 5, 3)
∇L = rand(5, 3)
gradL = rgrad(Y, ∇L)
Δ = rgrad(Y, rand(5, 3))

metric(Y, gradL, Δ) ≈ tr(∇L' * Δ)

true

The Grassmann Manifold

The Grassmann manifold is closely related to the Stiefel manifold, and an element of the Grassmann manifold can be represented through an element of the Stiefel manifold (but not vice-versa). An element of the Grassmann manifold $Gr(n,N)$ is a vector subspace $\subset\mathbb{R}^N$ of dimension $n$. Each such subspace (i.e. element of the Grassmann manifold) can be represented by a full-rank matrix $A\in\mathbb{R}^{N\times{}n}$ and we identify two elements with the following equivalence relation:

\[ A_1 \sim A_2 \iff \exists{}C\in\mathbb{R}^{n\times{}n}\text{ s.t. }A_1C = A_2.\]

The resulting manifold is of dimension $n(N-n)$. One can find a parametrization of the manifold the following way: Because the matrix $Y$ has full rank, there have to be $n$ independent columns in it: $i_1, \ldots, i_n$. For simplicity assume that $i_1 = 1, i_2=2, \ldots, i_n=n$ and call the matrix made up of these columns $C$. Then the mapping to the coordinate chart is: $YC^{-1}$ and the last $N-n$ columns are the coordinates.

We can also define the Grassmann manifold based on the Stiefel manifold since elements of the Stiefel manifold are already full-rank matrices. In this case we have the following equivalence relation (for $Y_1, Y_2\in{}St(n,N)$):

\[ Y_1 \sim Y_2 \iff \exists{}C\in{}SO(n)\text{ s.t. }Y_1C = Y_2.\]

In GeometricMachineLearning elements of the Grassmann manifold are drawn the same way as elements of the Stiefel manifold:

rand(GrassmannManifold{Float32}, 5, 3)

5×3 GrassmannManifold{Float32, Matrix{Float32}}:
 -0.0461715   0.0483366  -0.564308
  0.873444   -0.25248     0.244718
  0.45861     0.549674   -0.274063
  0.0829837   0.55703    -0.270157
  0.133247   -0.567005   -0.688167

The Riemannian Gradient of the Grassmann Manifold

Obtaining the Riemannian Gradient for the Grassmann manifold is slightly more difficult than it is in the case of the Stiefel manifold [20]. Since the Grassmann manifold can be obtained from the Stiefel manifold through an equivalence relation, we can however use this as a starting point.

Theorem

The Riemannian gradient of a function $L$ defined on the Grassmann manifold can be written as

\[\mathrm{grad}_\mathcal{Y}^{Gr}L \simeq \nabla_Y{}L - YY^T\nabla_YL,\]

where $\nabla_Y{}L$ is again the Euclidean gradient.

Proof

In a first step we identify charts on the Grassmann manifold to make dealing with it easier. For this consider the following open cover of the Grassmann manifold.

\[\{\mathcal{U}_W\}_{W\in{}St(n, N)} \quad\text{where}\quad \mathcal{U}_W = \{\mathrm{span}(Y):\mathrm{det}(W^TY)\neq0\}.\]

We can find a canonical bijective mapping from the set $\mathcal{U}_W$ to the set $\mathcal{S}_W := \{Y\in\mathbb{R}^{N\times{}n}:W^TY=\mathbb{I}_n\}$:

\[\sigma_W: \mathcal{U}_W \to \mathcal{S}_W,\, \mathcal{Y}=\mathrm{span}(Y)\mapsto{}Y(W^TY)^{-1} =: \hat{Y}.\]

That $\sigma_W$ is well-defined is easy to see: Consider $YC$ with $C\in\mathbb{R}^{n\times{}n}$ non-singular. Then $YC(W^TYC)^{-1}=Y(W^TY)^{-1} = \hat{Y}$. With this isomorphism we can also find a representation of elements of the tangent space:

\[T_\mathcal{Y}\sigma_W: T_\mathcal{Y}Gr(n,N)\to{}T_{\hat{Y}}\mathcal{S}_W.\]

We give an explicit representation of this isomorphism; because the map $\sigma_W$ does not care about the representation of $\mathrm{span}(Y)$ we can perform the variations in $St(n,N)$. We write the variations as $Y(t)\in{}St(n,N)$ for $t\in(-\varepsilon,\varepsilon)$. We also set $Y(0) = Y$ and hence

\[\frac{d}{dt}Y(t)(W^TY(t))^{-1} = (\dot{Y}(0) - Y(W^TY)^{-1}W^T\dot{Y}(0))(W^TY)^{-1},\]

where $\dot{Y}(0)\in{}T_YSt(n,N)$. Also note note that we have $T_\mathcal{Y}\mathcal{U}_W = T_\mathcal{Y}Gr(n,N)$ because $\mathcal{U}_W$ is an open subset of $Gr(n,N)$. We thus can identify the tangent space $T_\mathcal{Y}Gr(n,N)$ with the following set:

\[T_{\hat{Y}}\mathcal{S}_W = \{(\Delta - YW^T\Delta)(W^TY)^{-1}: Y\in{}St(n,N)\text{ s.t. }\mathrm{span}(Y)=\mathcal{Y}\text{ and }\Delta\in{}T_YSt(n,N)\}.\]

Further note that we can pick any element $W$ to construct the charts for a neighborhood around the point $\mathcal{Y}\in{}Gr(n,N)$ as long as we have $\mathrm{det}(W^TY)\neq0$ for $\mathrm{span}(Y)=\mathcal{Y}$. We hence take $W=Y$ and get the identification:

\[T_\mathcal{Y}Gr(n,N) \equiv \{\Delta - YY^T\Delta: Y\in{}St(n,N)\text{ s.t. }\mathrm{span}(Y)=\mathcal{Y}\text{ and }\Delta\in{}T_YSt(n,N)\},\]

which is very easy to handle computationally (we simply store and change the matrix $Y$ that represents an element of the Grassmann manifold). In this representation the Riemannian gradient is then

\[\mathrm{grad}_\mathcal{Y}^{Gr}L = \mathrm{grad}_Y^{St}L - YY^T\mathrm{grad}_Y^{St}L = \nabla_Y{}L - YY^T\nabla_YL,\]

where $\mathrm{grad}^{St}_YL$ is the Riemannian gradient of the Stiefel manifold at $Y$. We proved our assertion.

Library Functions

GeometricMachineLearning.StiefelManifold — Type

StiefelManifold <: Manifold

An implementation of the Stiefel manifold [1]. The Stiefel manifold is the collection of all matrices $Y\in\mathbb{R}^{N\times{}n}$ whose columns are orthonormal, i.e.

\[ St(n, N) = \{Y: Y^TY = \mathbb{I}_n \}.\]

The Stiefel manifold can be shown to have manifold structure (as the name suggests) and this is heavily used in GeometricMachineLearning. It is further a compact space. More information can be found in the docstrings for rgrad(::StiefelManifold, ::AbstractMatrix) and metric(::StiefelManifold, ::AbstractMatrix, ::AbstractMatrix).

source

GeometricMachineLearning.StiefelProjection — Type

StiefelProjection(backend, T, N, n)

Make a matrix of the form $\begin{bmatrix} \mathbb{I} & \mathbb{O} \end{bmatrix}^T$ for a specific backend and data type.

An array that essentially does vcat(I(n), zeros(N-n, n)) with GPU support.

Extended help

An instance of StiefelProjection should technically also belong to StiefelManifold.

source

GeometricMachineLearning.GrassmannManifold — Type

GrassmannManifold <: Manifold

The GrassmannManifold is based on the StiefelManifold.

source

GeometricMachineLearning.metric — Method

metric(Y::StiefelManifold, Δ₁::AbstractMatrix, Δ₂::AbstractMatrix)

Compute the dot product for Δ₁ and Δ₂ at Y.

This uses the canonical Riemannian metric for the Stiefel manifold:

\[g_Y: (\Delta_1, \Delta_2) \mapsto \mathrm{Tr}(\Delta_1^T(\mathbb{I} - \frac{1}{2}YY^T)\Delta_2).\]

source

GeometricMachineLearning.rgrad — Method

rgrad(Y::StiefelManifold, ∇L::AbstractMatrix)

Compute the Riemannian gradient for the Stiefel manifold at Y based on ∇L.

Here $Y\in{}St(N,n)$ and $\nabla{}L\in\mathbb{R}^{N\times{}n}$ is the Euclidean gradient.

The function computes the Riemannian gradient with respect to the canonical metric: metric(::StiefelManifold, ::AbstractMatrix, ::AbstractMatrix).

The precise form of the mapping is:

\[\mathtt{rgrad}(Y, \nabla{}L) \mapsto \nabla{}L - Y(\nabla{}L)^TY\]

Note the property $Y^T\mathtt{rgrad}(Y, \nabla{}L)\in\mathcal{S}_\mathrm{skew}(n).$

Examples

using GeometricMachineLearning

Y = StiefelManifold([1 0 ; 0 1 ; 0 0; 0 0])
Δ = [1 2; 3 4; 5 6; 7 8]
rgrad(Y, Δ)

# output

4×2 Matrix{Int64}:
 0  -1
 1   0
 5   6
 7   8

source

GeometricMachineLearning.metric — Method

metric(Y::GrassmannManifold, Δ₁::AbstractMatrix, Δ₂::AbstractMatrix)

Compute the metric for vectors Δ₁ and Δ₂ at Y.

The representation of the Grassmann manifold is realized as a quotient space of the Stiefel manifold.

The metric for the Grassmann manifold is:

\[g^{Gr}_Y(\Delta_1, \Delta_2) = g^{St}_Y(\Delta_1, \Delta_2) = \mathrm{Tr}(\Delta_1^T (\mathbb{I} - Y Y^T) \Delta_2) = \mathrm{Tr}(\Delta_1^T \Delta_2),\]

where we used that $Y^T\Delta_i$ for $i = 1, 2.$

source

GeometricMachineLearning.rgrad — Method

rgrad(Y::GrassmannManifold, ∇L::AbstractMatrix)

Compute the Riemannian gradient for the Grassmann manifold at Y based on ∇L.

Here $Y$ is a representation of $\mathrm{span}(Y)\in{}Gr(n, N)$ and $\nabla{}L\in\mathbb{R}^{N\times{}n}$ is the Euclidean gradient.

This gradient has the property that it is orthogonal to the space spanned by $Y$.

The precise form of the mapping is:

\[\mathtt{rgrad}(Y, \nabla{}L) \mapsto \nabla{}L - YY^T\nabla{}L.\]

Note the property $Y^T\mathrm{rgrad}(Y, \nabla{}L) = \mathbb{O}.$

Also see rgrad(::StiefelManifold, ::AbstractMatrix).

Examples

using GeometricMachineLearning

Y = GrassmannManifold([1 0 ; 0 1 ; 0 0; 0 0])
Δ = [1 2; 3 4; 5 6; 7 8]
rgrad(Y, Δ)

# output

4×2 Matrix{Int64}:
 0  0
 0  0
 5  6
 7  8

source

GeometricMachineLearning.Ω — Method

Ω(Y::StiefelManifold{T}, Δ::AbstractMatrix{T}) where T

Perform canonical horizontal lift for the Stiefel manifold:

\[ \Delta \mapsto (\mathbb{I} - \frac{1}{2}YY^T)\Delta{}Y^T - Y\Delta^T(\mathbb{I} - \frac{1}{2}YY^T).\]

Internally this performs

SkewSymMatrix(2 * (I(n) - .5 * Y * Y') * Δ * Y')

It uses SkewSymMatrix to save memory.

Examples

using GeometricMachineLearning
E = StiefelManifold(StiefelProjection(5, 2))
Δ = [0. -1.; 1. 0.; 2. 3.; 4. 5.; 6. 7.]
GeometricMachineLearning.Ω(E, Δ)

# output

5×5 SkewSymMatrix{Float64, Vector{Float64}}:
 0.0  -1.0  -2.0  -4.0  -6.0
 1.0   0.0  -3.0  -5.0  -7.0
 2.0   3.0   0.0  -0.0  -0.0
 4.0   5.0   0.0   0.0  -0.0
 6.0   7.0   0.0   0.0   0.0

Note that the output of Ω is a skew-symmetric matrix, i.e. an element of $\mathfrak{g}$.

source

GeometricMachineLearning.Ω — Method

Ω(Y::GrassmannManifold{T}, Δ::AbstractMatrix{T}) where T

Perform the canonical horizontal lift for the Grassmann manifold:

\[ \Delta \mapsto \Omega^{St}(\Delta),\]

where $\Omega^{St}$ is the canonical horizontal lift for the Stiefel manifold.

using GeometricMachineLearning
E = GrassmannManifold(StiefelProjection(5, 2))
Δ = [0. 0.; 0. 0.; 2. 3.; 4. 5.; 6. 7.]
GeometricMachineLearning.Ω(E, Δ)

# output

5×5 SkewSymMatrix{Float64, Vector{Float64}}:
 0.0  -0.0  -2.0  -4.0  -6.0
 0.0   0.0  -3.0  -5.0  -7.0
 2.0   3.0   0.0  -0.0  -0.0
 4.0   5.0   0.0   0.0  -0.0
 6.0   7.0   0.0   0.0   0.0

source

References

[20]: P.-A. Absil, R. Mahony and R. Sepulchre. Riemannian geometry of Grassmann manifolds with a view on algorithmic computation. Acta Applicandae Mathematica 80, 199–220 (2004).
[109]: T. Frankel. The geometry of physics: an introduction (Cambridge university press, Cambridge, UK, 2011).
[37]: T. Bendokat and R. Zimmermann. The real symplectic Stiefel and Grassmann manifolds: metrics, geodesics and applications, arXiv preprint arXiv:2108.12447 (2021).

1Recall that a Lie group is a manifold that also has group structure. We say that a Lie group $G$ acts on a manifold $\mathcal{M}$ if there is a map $G\times\mathcal{M} \to \mathcal{M}$ such that $(ab)x = a(bx)$ for $a,b\in{}G$ and $x\in\mathcal{M}$. For us the Lie algebra belonging to a Lie group, denoted by $\mathfrak{g}$, is the tangent space to the identity element $T_\mathbb{I}G$.
2Note that we can easily check $\{BY : B\in\mathfrak{g}\} \subset \{\Delta\in\mathbb{R}^{N\times{}n}: \Delta^TY + Y^T\Delta = \mathbb{O}\}.$ The isomorphism is used to proof $\{\Delta\in\mathbb{R}^{N\times{}n}: \Delta^TY + Y^T\Delta = \mathbb{O}\} \subset \{BY : B\in\mathfrak{g}\}$ as $\Omega(\Delta)\in\mathfrak{g}^{\mathrm{hor},Y}\subset\mathfrak{g}$ and $\Delta = \Omega(\Delta)Y.$
3Here we are testing with a randomly drawn element $\Delta\in{}T_Y\mathcal{M}.$