Retractions

In practice we usually do not solve the geodesic equation exactly in each optimization step (even though this is possible and computationally feasible), but prefer approximations that are called "retractions" [10] for stability. The definition of a retraction in GeometricMachineLearning is slightly different from how it is usually defined in textbooks [7, 10]. We discuss these differences here.

Classical Retractions

By "classical retraction" we here mean the textbook definition.

Theorem

A classical retraction is a smooth map

\[R: T\mathcal{M}\to\mathcal{M}:(x,v)\mapsto{}R_x(v),\]

such that each curve $c(t) := R_x(tv)$ is a local approximation of a geodesic, i.e. the following two conditions hold:

  1. $c(0) = x$ and
  2. $c'(0) = v.$

Perhaps the most common example for matrix manifolds is the Cayley retraction:

Example

The Cayley retraction is defined as

\[\mathrm{Cayley}(V_x) = \left(\mathbb{I} - \frac{1}{2}V_x\right)^{-1}\left(\mathbb{I} +\frac{1}{2}V_x\right).\]

We should mention that the factor $\frac{1}{2}$ is sometimes left out in the definition of the Cayley transform when used in different contexts. But it is necessary for defining a retraction.

We want to compare the geodesic retraction with the cayley retraction for the example we already introduced when talking about the exponential map:

η_increments = 0.1 : 0.1 : 5.5
Δ_increments = [Δ * η for η in η_increments]

Y_increments_geodesic = [geodesic(Y, Δ_increment) for Δ_increment in Δ_increments]
Y_increments_cayley = [cayley(Y, Δ_increment) for Δ_increment in Δ_increments]



Y_zeros = zeros(length(Y_increments_geodesic))
Y_geodesic_reshaped = [copy(Y_zeros), copy(Y_zeros), copy(Y_zeros)]
Y_cayley_reshaped = [copy(Y_zeros), copy(Y_zeros), copy(Y_zeros)]

zip_ob = zip(Y_increments_geodesic, Y_increments_cayley, axes(Y_increments_geodesic, 1))

for (Y_increment_geodesic, Y_increment_cayley, i) in zip_ob
    for d in (1, 2, 3)
        Y_geodesic_reshaped[d][i] = Y_increment_geodesic[d]

        Y_cayley_reshaped[d][i] = Y_increment_cayley[d]
    end
end

scatter!(ax, Y_geodesic_reshaped...;
        color = mred, markersize = 5, label = rich("geodesic retraction"; color = text_color))

scatter!(ax, Y_cayley_reshaped...;
        color = mblue, markersize = 5, label = rich("Cayley retraction"; color = text_color))

We see that for small $\Delta$ increments the Cayley retraction seems to match the geodesic retraction very well, but for larger values there is a notable discrepancy:

using LinearAlgebra: norm

discrepancies = [norm(Y_geo_inc - Y_cay_inc) for (Y_geo_inc, Y_cay_inc, _) in zip_ob]

In GeometricMachineLearning

The way we use retractions[1] in GeometricMachineLearning is slightly different from their classical definition:

Definition

Given a section $\lambda:\mathcal{M}\to{}G$ a retraction is a map $\mathrm{Retraction}:\mathfrak{g}^\mathrm{hor}\to{}G$ such that

\[\Delta \mapsto \lambda(Y)\mathrm{Retraction}(\lambda(Y)^{-1}\Omega(\Delta)\lambda(Y))E,\]

is a classical retraction.

We now discuss how two of these retractions, the geodesic retraction (exponential map) and the Cayley retraction, are implemented in GeometricMachineLearning.

Retractions for Homogeneous Spaces

Here we harness special properties of homogeneous spaces to obtain computationally efficient retractions for the Stiefel manifold and the Grassmann manifold. This is also discussed in e.g. [11, 15].

The geodesic retraction is a retraction whose associated curve is also the unique geodesic. For many matrix Lie groups (including $SO(N)$) geodesics are obtained by simply evaluating the exponential map [10, 16]:

Theorem

The geodesic on a compact matrix Lie group $G$ with bi-invariant metric for $B\in{}T_AG$ is simply

\[\gamma(t) = \exp(t\cdot{}BA^{-1})A,\]

where $\exp:\mathcal{g}\to{}G$ is the matrix exponential map.

Because $SO(N)$ is compact and we furnish it with the canonical metric, i.e.

\[ g:T_AG\times{}T_AG \to \mathbb{R}, (B_1, B_2) \mapsto \mathrm{Tr}(B_1^TB_2) = \mathrm{Tr}((B_1A^{-1})^T(B_2A^{-1})),\]

its geodesics are thus equivalent to the exponential maps. We now use this observation to obtain expression for the geodesics on the Stiefel manifold $St(n, N)$. We use the following theorem from [16, Proposition 25.7]:

Theorem

The geodesics for a naturally-reductive homogeneous space $\mathcal{M}$ starting at $Y$ are given by:

\[\gamma_{\Delta}(t) = \exp(t\cdot\Omega(\Delta))Y,\]

where the $\exp$ is the exponential map for the Lie group $G$ corresponding to $\mathcal{M}$.

The theorem requires the homogeneous space to be naturally reductive:

Definition

A homogeneous space is called naturally-reductive if the following two conditions hold:

  1. $A^{-1}BA\in\mathfrak{g}^\mathrm{hor}$ for every $B\in\mathfrak{g}^\mathrm{hor}$ and $A\in\exp(\mathfrak{g}^\mathrm{ver}$),
  2. $g([X, Y]^\mathrm{hor}, Z) = g(X, [Y, Z]^\mathrm{hor})$ for all $X, Y, Z \in \mathfrak{g}^\mathrm{hor}$,

where $[X, Y]^\mathrm{hor} = \Omega(XYE - YXE)$. If only the first condition holds the homogeneous space is called reductive but not naturally-reductive.

We state here without proof that the Stiefel manifold and the Grassmann manifold are naturally-reductive. An empirical verification of this is very easy:

using GeometricMachineLearning

B = rand(SkewSymMatrix, 6) # ∈ 𝔤
A = exp(B - StiefelLieAlgHorMatrix(B, 3)) # ∈ exp(𝔤ᵛᵉʳ)

X = rand(StiefelLieAlgHorMatrix, 6, 3) # ∈ 𝔤ʰᵒʳ
Y = rand(StiefelLieAlgHorMatrix, 6, 3) # ∈ 𝔤ʰᵒʳ
Z = rand(StiefelLieAlgHorMatrix, 6, 3) # ∈ 𝔤ʰᵒʳ

A' * X * A # this has to be in 𝔤ʰᵒʳ for St(3, 6) to be reductive
6×6 Matrix{Float64}:
 0.0       -0.914247  -0.387374  -0.650933  -0.741877  -0.7678
 0.914247   0.0       -0.771916  -0.454759  -0.869684  -0.80972
 0.387374   0.771916   0.0       -0.924238  -0.292897  -0.749455
 0.650933   0.454759   0.924238   0.0        0.0        0.0
 0.741877   0.869684   0.292897   0.0        0.0        0.0
 0.7678     0.80972    0.749455   0.0        0.0        0.0

verifies the first property and

using LinearAlgebra: tr
adʰᵒʳ(X, Y) = StiefelLieAlgHorMatrix(X * Y - Y * X, 3)

tr(adʰᵒʳ(X, Y)' * Z) ≈ tr(X' * adʰᵒʳ(Y, Z))
true

verifies the second.

In GeometricMachineLearning we always work with elements in $\mathfrak{g}^\mathrm{hor}$ and the Lie group $G$ is always $SO(N)$. We hence use:

\[ \gamma_\Delta(t) = \exp(\lambda(Y)\lambda(Y)^{-1}\Omega(\Delta)\lambda(Y)\lambda(Y)^{-1})Y = \lambda(Y)\exp(\lambda(Y)^{-1}\Omega(\Delta)\lambda(Y))E,\]

where we have used that

\[ \exp(\Lambda{}B\Lambda^{-1}) = \sum_{n = 0}^\infty \frac{1}{n!}(\Lambda{}B\Lambda^{-1})^n = \sum_{n = 0}^\infty \frac{1}{n!}\underbrace{(\Lambda{}B\Lambda^{-1})}_{\text{$n$ times}} = \sum_{n = 0}^\infty \Lambda\frac{1}{n!}B^n\Lambda^{-1}.\]

Based on this we define the maps:

\[\mathtt{geodesic}: \mathfrak{g}^\mathrm{hor} \to G, B \mapsto \exp(B),\]

and

\[\mathtt{cayley}: \mathfrak{g}^\mathrm{hor} \to G, B \mapsto \mathrm{Cayley}(B),\]

where $B = \lambda(Y)^{-1}\Omega(\Delta)\lambda(Y)$. These expressions for geodesic and cayley are the ones that we typically use in GeometricMachineLearning for computational reasons. We show how we can utilize the sparse structure of $\mathfrak{g}^\mathrm{hor}$ for computing the geodesic retraction and the Cayley retraction (i.e. the expressions $\exp(B)$ and $\mathrm{Cayley}(B)$ for $B\in\mathfrak{g}^\mathrm{hor}$). Similar derivations can be found in [15, 17, 18].

Remark

Further note that, even though the global section $\lambda:\mathcal{M} \to G$ is not unique, the final geodesic $\gamma_\Delta(t) = \lambda(Y)\exp(\lambda(Y)^{-1}\Omega(\Delta)\lambda(Y))E$ does not depend on the particular section we choose.

The Geodesic Retraction

An element of $\mathfrak{g}^\mathrm{hor}$ can be written as:

\[\begin{bmatrix} A & -B^T \\ B & \mathbb{O} \end{bmatrix} = \begin{bmatrix} \frac{1}{2}A & \mathbb{I} \\ B & \mathbb{O} \end{bmatrix} \begin{bmatrix} \mathbb{I} & \mathbb{O} \\ \frac{1}{2}A & -B^T \end{bmatrix} =: B'(B'')^T,\]

where we exploit the sparse structure of the array, i.e. it is a multiplication of a $N\times2n$ with a $2n\times{}N$ matrix.

We further use the following:

\[ \begin{aligned} \exp(B'(B'')^T) & = \sum_{n=0}^\infty \frac{1}{n!} (B'(B'')^T)^n = \mathbb{I} + \sum_{n=0}^\infty \frac{1}{n!} B'\sum_{n=1}^\infty B'((B'')^TB')(B'')^T \\ & = \mathbb{I} + B'\left( \sum_{n=1}^\infty \frac{1}{n!} ((B'')^TB')^{n-1} \right)B'' =: \mathbb{I} + B'\mathfrak{A}(B', B'')B'', \end{aligned}\]

where we defined $\mathfrak{A}(B', B'') := \sum_{n=1}^\infty \frac{1}{n!} ((B'')^TB')^{n-1}.$ Note that evaluating $\mathfrak{A}$ relies on computing products of small matrices of size $2n\times2n.$ We do this by relying on a simple Taylor expansion (see the docstring for GeometricMachineLearning.𝔄).

The final expression we obtain is:

\[\exp(B) = \mathbb{I} + B' \mathfrak{A}(B', B'') (B'')^T\]

The Cayley Retraction

For the Cayley retraction we leverage the decomposition of $B = B'(B'')^T\in\mathfrak{g}^\mathrm{hor}$ through the Sherman-Morrison-Woodbury formula:

\[(\mathbb{I} - \frac{1}{2}B'(B'')^T)^{-1} = \mathbb{I} + \frac{1}{2}B'(\mathbb{I} - \frac{1}{2}B'(B'')^T)^{-1}(B'')^T\]

So what we have to compute the inverse of:

\[\mathbb{I} - \frac{1}{2}\begin{bmatrix} \mathbb{I} & \mathbb{O} \\ \frac{1}{2}A & -B^T \end{bmatrix}\begin{bmatrix} \frac{1}{2}A & \mathbb{I} \\ B & \mathbb{O} \end{bmatrix} = \begin{bmatrix} \mathbb{I} - \frac{1}{4}A & - \frac{1}{2}\mathbb{I} \\ \frac{1}{2}B^TB - \frac{1}{8}A^2 & \mathbb{I} - \frac{1}{4}A \end{bmatrix}.\]

By leveraging the sparse structure of the matrices in $\mathfrak{g}^\mathrm{hor}$ we arrive at the following expression for the Cayley retraction (similar to the case of the geodesic retraction):

\[\mathrm{Cayley}(B) = \mathbb{I} + \frac{1}{2} B' (\mathbb{I}_{2n} - \frac{1}{2} (B'')^T B')^{-1} (B'')^T (\mathbb{I} + \frac{1}{2} B),\]

where we have abbreviated $\mathbb{I} := \mathbb{I}_N.$ We conclude with a remark:

Remark

As mentioned previously the Lie group $SO(N)$, i.e. the one corresponding to the Stiefel manifold and the Grassmann manifold, has a bi-invariant Riemannian metric associated with it: $(B_1,B_2)\mapsto \mathrm{Tr}(B_1^TB_2)$. For other Lie groups (e.g. the symplectic group) the situation is slightly more difficult bendokat2021real.

Library Functions

GeometricMachineLearning.geodesicMethod
geodesic(Y::Manifold, Δ)

Take as input an element of a manifold Y and a tangent vector in Δ in the corresponding tangent space and compute the geodesic (exponential map).

In different notation: take as input an element $x$ of $\mathcal{M}$ and an element of $T_x\mathcal{M}$ and return $\mathtt{geodesic}(x, v_x) = \exp(v_x).$ For example:

Y = rand(StiefelManifold{Float64}, N, n)
Δ = rgrad(Y, rand(N, n))
geodesic(Y, Δ)

See the docstring for rgrad for details on this function.

source
geodesic(B::StiefelLieAlgHorMatrix)

Compute the geodesic of an element in StiefelLieAlgHorMatrix.

Implementation

This is using a computationally efficient version of the matrix exponential. See GeometricMachineLearning.𝔄.

source
geodesic(B::GrassmannLieAlgHorMatrix)

Compute the geodesic of an element in GrassmannLieAlgHorMatrix.

See geodesic(::StiefelLieAlgHorMatrix).

source
GeometricMachineLearning.geodesicMethod
geodesic(Y::Manifold, Δ)

Take as input an element of a manifold Y and a tangent vector in Δ in the corresponding tangent space and compute the geodesic (exponential map).

In different notation: take as input an element $x$ of $\mathcal{M}$ and an element of $T_x\mathcal{M}$ and return $\mathtt{geodesic}(x, v_x) = \exp(v_x).$ For example:

Y = rand(StiefelManifold{Float64}, N, n)
Δ = rgrad(Y, rand(N, n))
geodesic(Y, Δ)

See the docstring for rgrad for details on this function.

source
geodesic(B::StiefelLieAlgHorMatrix)

Compute the geodesic of an element in StiefelLieAlgHorMatrix.

Implementation

This is using a computationally efficient version of the matrix exponential. See GeometricMachineLearning.𝔄.

source
geodesic(B::GrassmannLieAlgHorMatrix)

Compute the geodesic of an element in GrassmannLieAlgHorMatrix.

See geodesic(::StiefelLieAlgHorMatrix).

source
GeometricMachineLearning.cayleyMethod
cayley(Y::Manifold, Δ)

Take as input an element of a manifold Y and a tangent vector in Δ in the corresponding tangent space and compute the Cayley retraction.

In different notation: take as input an element $x$ of $\mathcal{M}$ and an element of $T_x\mathcal{M}$ and return $\mathrm{Cayley}(v_x).$ For example:

Y = rand(StiefelManifold{Float64}, N, n)
Δ = rgrad(Y, rand(N, n))
cayley(Y, Δ)

See the docstring for rgrad for details on this function.

source
cayley(B::StiefelLieAlgHorMatrix)

Compute the Cayley retraction of B and multiply it with E (the distinct element of the Stiefel manifold).

source
cayley(B::GrassmannLieAlgHorMatrix)

Compute the Cayley retraction of B and multiply it with E (the distinct element of the Stiefel manifold).

See cayley(::StiefelLieAlgHorMatrix).

source
GeometricMachineLearning.cayleyMethod
cayley(Y::Manifold, Δ)

Take as input an element of a manifold Y and a tangent vector in Δ in the corresponding tangent space and compute the Cayley retraction.

In different notation: take as input an element $x$ of $\mathcal{M}$ and an element of $T_x\mathcal{M}$ and return $\mathrm{Cayley}(v_x).$ For example:

Y = rand(StiefelManifold{Float64}, N, n)
Δ = rgrad(Y, rand(N, n))
cayley(Y, Δ)

See the docstring for rgrad for details on this function.

source
cayley(B::StiefelLieAlgHorMatrix)

Compute the Cayley retraction of B and multiply it with E (the distinct element of the Stiefel manifold).

source
cayley(B::GrassmannLieAlgHorMatrix)

Compute the Cayley retraction of B and multiply it with E (the distinct element of the Stiefel manifold).

See cayley(::StiefelLieAlgHorMatrix).

source
GeometricMachineLearning.cayleyMethod
cayley(Y::Manifold, Δ)

Take as input an element of a manifold Y and a tangent vector in Δ in the corresponding tangent space and compute the Cayley retraction.

In different notation: take as input an element $x$ of $\mathcal{M}$ and an element of $T_x\mathcal{M}$ and return $\mathrm{Cayley}(v_x).$ For example:

Y = rand(StiefelManifold{Float64}, N, n)
Δ = rgrad(Y, rand(N, n))
cayley(Y, Δ)

See the docstring for rgrad for details on this function.

source
GeometricMachineLearning.𝔄Function
𝔄(A)

Compute $\mathfrak{A}(A) := \sum_{n=1}^\infty \frac{1}{n!} (A)^{n-1}.$

Implementation

This uses a Taylor expansion that iteratively adds terms with

while norm(Aⁿ) > ε
    mul!(A_temp, Aⁿ, A)
    Aⁿ .= A_temp
    rmul!(Aⁿ, inv(n))

    𝔄 += B
    n += 1 
end

until the norm of Aⁿ becomes smaller than machine precision. The counter n in the above algorithm is initialized as 2 The matrices Aⁿ and 𝔄 are initialized as the identity matrix.

source
𝔄(B̂, B̄)

Compute $\mathfrak{A}(B', B'') := \sum_{n=1}^\infty \frac{1}{n!} ((B'')^TB')^{n-1}.$

This expression has the property $\mathbb{I} + B'\mathfrak{A}(B', B'')(B'')^T = \exp(B'(B'')^T).$

Examples

using GeometricMachineLearning
using GeometricMachineLearning: 𝔄
import Random
Random.seed!(123)

B = rand(StiefelLieAlgHorMatrix, 10, 2)
B̂ = hcat(vcat(.5 * B.A, B.B), vcat(one(B.A), zero(B.B)))
B̄ = hcat(vcat(one(B.A), zero(B.B)), vcat(-.5 * B.A, -B.B))

one(B̂ * B̄') + B̂ * 𝔄(B̂, B̄) * B̄' ≈ exp(Matrix(B))

# output

true
source

References

[10]
P.-A. Absil, R. Mahony and R. Sepulchre. Optimization algorithms on matrix manifolds (Princeton University Press, Princeton, New Jersey, 2008).
[15]
T. Bendokat and R. Zimmermann. The real symplectic Stiefel and Grassmann manifolds: metrics, geodesics and applications, arXiv preprint arXiv:2108.12447 (2021).
[16]
B. O'neill. Semi-Riemannian geometry with applications to relativity (Academic press, New York City, New York, 1983).