Retractions
In practice we usually do not solve the geodesic equation exactly in each optimization step (even though this is possible and computationally feasible), but prefer approximations that are called "retractions" [10] for stability. The definition of a retraction in GeometricMachineLearning
is slightly different from how it is usually defined in textbooks [7, 10]. We discuss these differences here.
Classical Retractions
By "classical retraction" we here mean the textbook definition.
A classical retraction is a smooth map
\[R: T\mathcal{M}\to\mathcal{M}:(x,v)\mapsto{}R_x(v),\]
such that each curve $c(t) := R_x(tv)$ is a local approximation of a geodesic, i.e. the following two conditions hold:
- $c(0) = x$ and
- $c'(0) = v.$
Perhaps the most common example for matrix manifolds is the Cayley retraction:
The Cayley retraction is defined as
\[\mathrm{Cayley}(V_x) = \left(\mathbb{I} - \frac{1}{2}V_x\right)^{-1}\left(\mathbb{I} +\frac{1}{2}V_x\right).\]
We should mention that the factor $\frac{1}{2}$ is sometimes left out in the definition of the Cayley transform when used in different contexts. But it is necessary for defining a retraction.
We want to compare the geodesic
retraction with the cayley
retraction for the example we already introduced when talking about the exponential map:
η_increments = 0.1 : 0.1 : 5.5
Δ_increments = [Δ * η for η in η_increments]
Y_increments_geodesic = [geodesic(Y, Δ_increment) for Δ_increment in Δ_increments]
Y_increments_cayley = [cayley(Y, Δ_increment) for Δ_increment in Δ_increments]
Y_zeros = zeros(length(Y_increments_geodesic))
Y_geodesic_reshaped = [copy(Y_zeros), copy(Y_zeros), copy(Y_zeros)]
Y_cayley_reshaped = [copy(Y_zeros), copy(Y_zeros), copy(Y_zeros)]
zip_ob = zip(Y_increments_geodesic, Y_increments_cayley, axes(Y_increments_geodesic, 1))
for (Y_increment_geodesic, Y_increment_cayley, i) in zip_ob
for d in (1, 2, 3)
Y_geodesic_reshaped[d][i] = Y_increment_geodesic[d]
Y_cayley_reshaped[d][i] = Y_increment_cayley[d]
end
end
scatter!(ax, Y_geodesic_reshaped...;
color = mred, markersize = 5, label = rich("geodesic retraction"; color = text_color))
scatter!(ax, Y_cayley_reshaped...;
color = mblue, markersize = 5, label = rich("Cayley retraction"; color = text_color))
We see that for small $\Delta$ increments the Cayley retraction seems to match the geodesic retraction very well, but for larger values there is a notable discrepancy:
using LinearAlgebra: norm
discrepancies = [norm(Y_geo_inc - Y_cay_inc) for (Y_geo_inc, Y_cay_inc, _) in zip_ob]
In GeometricMachineLearning
The way we use retractions[1] in GeometricMachineLearning
is slightly different from their classical definition:
Given a section $\lambda:\mathcal{M}\to{}G$ a retraction is a map $\mathrm{Retraction}:\mathfrak{g}^\mathrm{hor}\to{}G$ such that
\[\Delta \mapsto \lambda(Y)\mathrm{Retraction}(\lambda(Y)^{-1}\Omega(\Delta)\lambda(Y))E,\]
is a classical retraction.
We now discuss how two of these retractions, the geodesic retraction (exponential map) and the Cayley retraction, are implemented in GeometricMachineLearning
.
Retractions for Homogeneous Spaces
Here we harness special properties of homogeneous spaces to obtain computationally efficient retractions for the Stiefel manifold and the Grassmann manifold. This is also discussed in e.g. [11, 15].
The geodesic retraction is a retraction whose associated curve is also the unique geodesic. For many matrix Lie groups (including $SO(N)$) geodesics are obtained by simply evaluating the exponential map [10, 16]:
The geodesic on a compact matrix Lie group $G$ with bi-invariant metric for $B\in{}T_AG$ is simply
\[\gamma(t) = \exp(t\cdot{}BA^{-1})A,\]
where $\exp:\mathcal{g}\to{}G$ is the matrix exponential map.
Because $SO(N)$ is compact and we furnish it with the canonical metric, i.e.
\[ g:T_AG\times{}T_AG \to \mathbb{R}, (B_1, B_2) \mapsto \mathrm{Tr}(B_1^TB_2) = \mathrm{Tr}((B_1A^{-1})^T(B_2A^{-1})),\]
its geodesics are thus equivalent to the exponential maps. We now use this observation to obtain expression for the geodesics on the Stiefel manifold $St(n, N)$. We use the following theorem from [16, Proposition 25.7]:
The geodesics for a naturally-reductive homogeneous space $\mathcal{M}$ starting at $Y$ are given by:
\[\gamma_{\Delta}(t) = \exp(t\cdot\Omega(\Delta))Y,\]
where the $\exp$ is the exponential map for the Lie group $G$ corresponding to $\mathcal{M}$.
The theorem requires the homogeneous space to be naturally reductive:
A homogeneous space is called naturally-reductive if the following two conditions hold:
- $A^{-1}BA\in\mathfrak{g}^\mathrm{hor}$ for every $B\in\mathfrak{g}^\mathrm{hor}$ and $A\in\exp(\mathfrak{g}^\mathrm{ver}$),
- $g([X, Y]^\mathrm{hor}, Z) = g(X, [Y, Z]^\mathrm{hor})$ for all $X, Y, Z \in \mathfrak{g}^\mathrm{hor}$,
where $[X, Y]^\mathrm{hor} = \Omega(XYE - YXE)$. If only the first condition holds the homogeneous space is called reductive but not naturally-reductive.
We state here without proof that the Stiefel manifold and the Grassmann manifold are naturally-reductive. An empirical verification of this is very easy:
using GeometricMachineLearning
B = rand(SkewSymMatrix, 6) # ∈ 𝔤
A = exp(B - StiefelLieAlgHorMatrix(B, 3)) # ∈ exp(𝔤ᵛᵉʳ)
X = rand(StiefelLieAlgHorMatrix, 6, 3) # ∈ 𝔤ʰᵒʳ
Y = rand(StiefelLieAlgHorMatrix, 6, 3) # ∈ 𝔤ʰᵒʳ
Z = rand(StiefelLieAlgHorMatrix, 6, 3) # ∈ 𝔤ʰᵒʳ
A' * X * A # this has to be in 𝔤ʰᵒʳ for St(3, 6) to be reductive
6×6 Matrix{Float64}:
0.0 -0.914247 -0.387374 -0.650933 -0.741877 -0.7678
0.914247 0.0 -0.771916 -0.454759 -0.869684 -0.80972
0.387374 0.771916 0.0 -0.924238 -0.292897 -0.749455
0.650933 0.454759 0.924238 0.0 0.0 0.0
0.741877 0.869684 0.292897 0.0 0.0 0.0
0.7678 0.80972 0.749455 0.0 0.0 0.0
verifies the first property and
using LinearAlgebra: tr
adʰᵒʳ(X, Y) = StiefelLieAlgHorMatrix(X * Y - Y * X, 3)
tr(adʰᵒʳ(X, Y)' * Z) ≈ tr(X' * adʰᵒʳ(Y, Z))
true
verifies the second.
In GeometricMachineLearning
we always work with elements in $\mathfrak{g}^\mathrm{hor}$ and the Lie group $G$ is always $SO(N)$. We hence use:
\[ \gamma_\Delta(t) = \exp(\lambda(Y)\lambda(Y)^{-1}\Omega(\Delta)\lambda(Y)\lambda(Y)^{-1})Y = \lambda(Y)\exp(\lambda(Y)^{-1}\Omega(\Delta)\lambda(Y))E,\]
where we have used that
\[ \exp(\Lambda{}B\Lambda^{-1}) = \sum_{n = 0}^\infty \frac{1}{n!}(\Lambda{}B\Lambda^{-1})^n = \sum_{n = 0}^\infty \frac{1}{n!}\underbrace{(\Lambda{}B\Lambda^{-1})}_{\text{$n$ times}} = \sum_{n = 0}^\infty \Lambda\frac{1}{n!}B^n\Lambda^{-1}.\]
Based on this we define the maps:
\[\mathtt{geodesic}: \mathfrak{g}^\mathrm{hor} \to G, B \mapsto \exp(B),\]
and
\[\mathtt{cayley}: \mathfrak{g}^\mathrm{hor} \to G, B \mapsto \mathrm{Cayley}(B),\]
where $B = \lambda(Y)^{-1}\Omega(\Delta)\lambda(Y)$. These expressions for geodesic
and cayley
are the ones that we typically use in GeometricMachineLearning
for computational reasons. We show how we can utilize the sparse structure of $\mathfrak{g}^\mathrm{hor}$ for computing the geodesic retraction and the Cayley retraction (i.e. the expressions $\exp(B)$ and $\mathrm{Cayley}(B)$ for $B\in\mathfrak{g}^\mathrm{hor}$). Similar derivations can be found in [15, 17, 18].
Further note that, even though the global section $\lambda:\mathcal{M} \to G$ is not unique, the final geodesic $\gamma_\Delta(t) = \lambda(Y)\exp(\lambda(Y)^{-1}\Omega(\Delta)\lambda(Y))E$ does not depend on the particular section we choose.
The Geodesic Retraction
An element of $\mathfrak{g}^\mathrm{hor}$ can be written as:
\[\begin{bmatrix} A & -B^T \\ B & \mathbb{O} \end{bmatrix} = \begin{bmatrix} \frac{1}{2}A & \mathbb{I} \\ B & \mathbb{O} \end{bmatrix} \begin{bmatrix} \mathbb{I} & \mathbb{O} \\ \frac{1}{2}A & -B^T \end{bmatrix} =: B'(B'')^T,\]
where we exploit the sparse structure of the array, i.e. it is a multiplication of a $N\times2n$ with a $2n\times{}N$ matrix.
We further use the following:
\[ \begin{aligned} \exp(B'(B'')^T) & = \sum_{n=0}^\infty \frac{1}{n!} (B'(B'')^T)^n = \mathbb{I} + \sum_{n=0}^\infty \frac{1}{n!} B'\sum_{n=1}^\infty B'((B'')^TB')(B'')^T \\ & = \mathbb{I} + B'\left( \sum_{n=1}^\infty \frac{1}{n!} ((B'')^TB')^{n-1} \right)B'' =: \mathbb{I} + B'\mathfrak{A}(B', B'')B'', \end{aligned}\]
where we defined $\mathfrak{A}(B', B'') := \sum_{n=1}^\infty \frac{1}{n!} ((B'')^TB')^{n-1}.$ Note that evaluating $\mathfrak{A}$ relies on computing products of small matrices of size $2n\times2n.$ We do this by relying on a simple Taylor expansion (see the docstring for GeometricMachineLearning.𝔄
).
The final expression we obtain is:
\[\exp(B) = \mathbb{I} + B' \mathfrak{A}(B', B'') (B'')^T\]
The Cayley Retraction
For the Cayley retraction we leverage the decomposition of $B = B'(B'')^T\in\mathfrak{g}^\mathrm{hor}$ through the Sherman-Morrison-Woodbury formula:
\[(\mathbb{I} - \frac{1}{2}B'(B'')^T)^{-1} = \mathbb{I} + \frac{1}{2}B'(\mathbb{I} - \frac{1}{2}B'(B'')^T)^{-1}(B'')^T\]
So what we have to compute the inverse of:
\[\mathbb{I} - \frac{1}{2}\begin{bmatrix} \mathbb{I} & \mathbb{O} \\ \frac{1}{2}A & -B^T \end{bmatrix}\begin{bmatrix} \frac{1}{2}A & \mathbb{I} \\ B & \mathbb{O} \end{bmatrix} = \begin{bmatrix} \mathbb{I} - \frac{1}{4}A & - \frac{1}{2}\mathbb{I} \\ \frac{1}{2}B^TB - \frac{1}{8}A^2 & \mathbb{I} - \frac{1}{4}A \end{bmatrix}.\]
By leveraging the sparse structure of the matrices in $\mathfrak{g}^\mathrm{hor}$ we arrive at the following expression for the Cayley retraction (similar to the case of the geodesic retraction):
\[\mathrm{Cayley}(B) = \mathbb{I} + \frac{1}{2} B' (\mathbb{I}_{2n} - \frac{1}{2} (B'')^T B')^{-1} (B'')^T (\mathbb{I} + \frac{1}{2} B),\]
where we have abbreviated $\mathbb{I} := \mathbb{I}_N.$ We conclude with a remark:
As mentioned previously the Lie group $SO(N)$, i.e. the one corresponding to the Stiefel manifold and the Grassmann manifold, has a bi-invariant Riemannian metric associated with it: $(B_1,B_2)\mapsto \mathrm{Tr}(B_1^TB_2)$. For other Lie groups (e.g. the symplectic group) the situation is slightly more difficult bendokat2021real.
Library Functions
GeometricMachineLearning.geodesic
— Methodgeodesic(Y::Manifold, Δ)
Take as input an element of a manifold Y
and a tangent vector in Δ
in the corresponding tangent space and compute the geodesic (exponential map).
In different notation: take as input an element $x$ of $\mathcal{M}$ and an element of $T_x\mathcal{M}$ and return $\mathtt{geodesic}(x, v_x) = \exp(v_x).$ For example:
Y = rand(StiefelManifold{Float64}, N, n)
Δ = rgrad(Y, rand(N, n))
geodesic(Y, Δ)
See the docstring for rgrad
for details on this function.
geodesic(B::StiefelLieAlgHorMatrix)
Compute the geodesic of an element in StiefelLieAlgHorMatrix
.
Implementation
This is using a computationally efficient version of the matrix exponential. See GeometricMachineLearning.𝔄
.
geodesic(B::GrassmannLieAlgHorMatrix)
Compute the geodesic of an element in GrassmannLieAlgHorMatrix
.
GeometricMachineLearning.geodesic
— Methodgeodesic(Y::Manifold, Δ)
Take as input an element of a manifold Y
and a tangent vector in Δ
in the corresponding tangent space and compute the geodesic (exponential map).
In different notation: take as input an element $x$ of $\mathcal{M}$ and an element of $T_x\mathcal{M}$ and return $\mathtt{geodesic}(x, v_x) = \exp(v_x).$ For example:
Y = rand(StiefelManifold{Float64}, N, n)
Δ = rgrad(Y, rand(N, n))
geodesic(Y, Δ)
See the docstring for rgrad
for details on this function.
geodesic(B::StiefelLieAlgHorMatrix)
Compute the geodesic of an element in StiefelLieAlgHorMatrix
.
Implementation
This is using a computationally efficient version of the matrix exponential. See GeometricMachineLearning.𝔄
.
geodesic(B::GrassmannLieAlgHorMatrix)
Compute the geodesic of an element in GrassmannLieAlgHorMatrix
.
GeometricMachineLearning.cayley
— Methodcayley(Y::Manifold, Δ)
Take as input an element of a manifold Y
and a tangent vector in Δ
in the corresponding tangent space and compute the Cayley retraction.
In different notation: take as input an element $x$ of $\mathcal{M}$ and an element of $T_x\mathcal{M}$ and return $\mathrm{Cayley}(v_x).$ For example:
Y = rand(StiefelManifold{Float64}, N, n)
Δ = rgrad(Y, rand(N, n))
cayley(Y, Δ)
See the docstring for rgrad
for details on this function.
cayley(B::StiefelLieAlgHorMatrix)
Compute the Cayley retraction of B
and multiply it with E
(the distinct element of the Stiefel manifold).
cayley(B::GrassmannLieAlgHorMatrix)
Compute the Cayley retraction of B
and multiply it with E
(the distinct element of the Stiefel manifold).
GeometricMachineLearning.cayley
— Methodcayley(Y::Manifold, Δ)
Take as input an element of a manifold Y
and a tangent vector in Δ
in the corresponding tangent space and compute the Cayley retraction.
In different notation: take as input an element $x$ of $\mathcal{M}$ and an element of $T_x\mathcal{M}$ and return $\mathrm{Cayley}(v_x).$ For example:
Y = rand(StiefelManifold{Float64}, N, n)
Δ = rgrad(Y, rand(N, n))
cayley(Y, Δ)
See the docstring for rgrad
for details on this function.
cayley(B::StiefelLieAlgHorMatrix)
Compute the Cayley retraction of B
and multiply it with E
(the distinct element of the Stiefel manifold).
cayley(B::GrassmannLieAlgHorMatrix)
Compute the Cayley retraction of B
and multiply it with E
(the distinct element of the Stiefel manifold).
GeometricMachineLearning.cayley
— Methodcayley(Y::Manifold, Δ)
Take as input an element of a manifold Y
and a tangent vector in Δ
in the corresponding tangent space and compute the Cayley retraction.
In different notation: take as input an element $x$ of $\mathcal{M}$ and an element of $T_x\mathcal{M}$ and return $\mathrm{Cayley}(v_x).$ For example:
Y = rand(StiefelManifold{Float64}, N, n)
Δ = rgrad(Y, rand(N, n))
cayley(Y, Δ)
See the docstring for rgrad
for details on this function.
GeometricMachineLearning.𝔄
— Function𝔄(A)
Compute $\mathfrak{A}(A) := \sum_{n=1}^\infty \frac{1}{n!} (A)^{n-1}.$
Implementation
This uses a Taylor expansion that iteratively adds terms with
while norm(Aⁿ) > ε
mul!(A_temp, Aⁿ, A)
Aⁿ .= A_temp
rmul!(Aⁿ, inv(n))
𝔄 += B
n += 1
end
until the norm of Aⁿ
becomes smaller than machine precision. The counter n
in the above algorithm is initialized as 2
The matrices Aⁿ
and 𝔄
are initialized as the identity matrix.
𝔄(B̂, B̄)
Compute $\mathfrak{A}(B', B'') := \sum_{n=1}^\infty \frac{1}{n!} ((B'')^TB')^{n-1}.$
This expression has the property $\mathbb{I} + B'\mathfrak{A}(B', B'')(B'')^T = \exp(B'(B'')^T).$
Examples
using GeometricMachineLearning
using GeometricMachineLearning: 𝔄
import Random
Random.seed!(123)
B = rand(StiefelLieAlgHorMatrix, 10, 2)
B̂ = hcat(vcat(.5 * B.A, B.B), vcat(one(B.A), zero(B.B)))
B̄ = hcat(vcat(one(B.A), zero(B.B)), vcat(-.5 * B.A, -B.B))
one(B̂ * B̄') + B̂ * 𝔄(B̂, B̄) * B̄' ≈ exp(Matrix(B))
# output
true
References
- [10]
- P.-A. Absil, R. Mahony and R. Sepulchre. Optimization algorithms on matrix manifolds (Princeton University Press, Princeton, New Jersey, 2008).
- [15]
- T. Bendokat and R. Zimmermann. The real symplectic Stiefel and Grassmann manifolds: metrics, geodesics and applications, arXiv preprint arXiv:2108.12447 (2021).
- [16]
- B. O'neill. Semi-Riemannian geometry with applications to relativity (Academic press, New York City, New York, 1983).
- 1Classical retractions are also defined in
GeometricMachineLearning
under the same name, i.e. there is e.g. a methodcayley(::StiefelLieAlgHorMatrix)
and a methodcayley(::StiefelManifold, ::AbstractMatrix)
(the latter being the classical retraction); but the user is strongly discouraged from using classical retractions as these are computational inefficient.