GeometricMachineLearning Library Functions

GeometricMachineLearning.AbstractRetractionType

AbstractRetraction is a type that comprises all retraction methods for manifolds. For every manifold layer one has to specify a retraction method that takes the layer and elements of the (global) tangent space.

source
GeometricMachineLearning.ActivationLayerPType
ActivationLayerP(n, σ)

Make an activation layer of size $n$ and with activation $σ$ that only changes the $p$ component.

Performs:

\[\begin{pmatrix} q \\ p \end{pmatrix} \mapsto \begin{pmatrix} q \\ p + \mathrm{diag}(a)\sigma(q) \end{pmatrix}.\]

source
GeometricMachineLearning.ActivationLayerQType
ActivationLayerQ(n, σ)

Make an activation layer of size $n$ and with activation $σ$ that only changes the $q$ component.

Performs:

\[\begin{pmatrix} q \\ p \end{pmatrix} \mapsto \begin{pmatrix} q + \mathrm{diag}(a)\sigma(p) \\ p \end{pmatrix}.\]

source
GeometricMachineLearning.AdamCacheType
AdamCache(Y)

Store the first and second moment for Y (initialized as zeros).

First and second moments are called B₁ and B₂.

If the cache is called with an instance of a homogeneous space, e.g. the StiefelManifold $St(n,N)$ it initializes the moments as elements of $\mathfrak{g}^\mathrm{hor}$ (StiefelLieAlgHorMatrix).

Examples

using GeometricMachineLearning

Y = rand(StiefelManifold, 5, 3)
AdamCache(Y).B₁

# output

5×5 StiefelLieAlgHorMatrix{Float64, SkewSymMatrix{Float64, Vector{Float64}}, Matrix{Float64}}:
 0.0  -0.0  -0.0  -0.0  -0.0
 0.0   0.0  -0.0  -0.0  -0.0
 0.0   0.0   0.0  -0.0  -0.0
 0.0   0.0   0.0   0.0   0.0
 0.0   0.0   0.0   0.0   0.0
source
GeometricMachineLearning.AdamOptimizerType
AdamOptimizer(η, ρ₁, ρ₂, δ)

Make an instance of the Adam Optimizer.

Here the cache consists of first and second moments that are updated as

\[B_1 \gets ((\rho_1 - \rho_1^t)/(1 - \rho_1^t))\cdot{}B_1 + (1 - \rho_1)/(1 - \rho_1^t)\cdot{}\nabla{}L,\]

and

\[B_2 \gets ((\rho_2 - \rho_1^t)/(1 - \rho_2^t))\cdot{}B_2 + (1 - \rho_2)/(1 - \rho_2^t)\cdot\nabla{}L\odot\nabla{}L.\]

The final velocity is computed as:

\[\mathrm{velocity} \gets -\eta{}B_1/\sqrt{B_2 + \delta}.\]

Implementation

The velocity is stored in the input to save memory.

Algorithm and suggested defaults are taken from [20, page 301].

source
GeometricMachineLearning.AdamOptimizerWithDecayType
AdamOptimizerWithDecay(n_epochs, η₁=1f-2, η₂=1f-6, ρ₁=9f-1, ρ₂=9.9f-1, δ=1f-8)

Make an instance of the Adam Optimizer with weight decay.

All except the first argument (the number of epochs) have defaults.

The difference to the standard AdamOptimizer is that we change the learning reate $\eta$ in each step. Apart from the time dependency of $\eta$ the two algorithms are however equivalent! $\eta(0)$ starts with a high value $\eta_1$ and then exponentially decrease until it reaches $\eta_2$ with

\[ \eta(t) = \gamma^t\eta_1,\]

where $\gamma = \exp(\log(\eta_1 / \eta_2) / \mathtt{n\_epochs}).$

source
GeometricMachineLearning.AutoEncoderType

An autoencoder [20] is a neural network consisting of an encoder $\Psi^e$ and a decoder $\Psi^d$. In the simplest case they are trained on some data set $\mathcal{D}$ to reduce the following error:

\[||\Psi^d\circ\Psi^e(\mathcal{D}) - \mathcal{D}||,\]

which we call the reconstruction error or autoencoder error (see the docs for AutoEncoderLoss) and $||\cdot||$ is some norm.

Implementation

Abstract AutoEncoder type. If a custom <:AutoEncoder architecture is implemented it should have the fields full_dim, reduced_dim, n_encoder_blocks and n_decoder_blocks. Further the routines encoder, decoder, encoder_parameters and decoder_parameters should be extended.

source
GeometricMachineLearning.AutoEncoderLossType

This loss should always be used together with a neural network of type AutoEncoder (and it is also the default for training such a network).

It simply computes:

\[\mathtt{AutoEncoderLoss}(nn\mathtt{::Loss}, input) = ||nn(input) - input||.\]

source
GeometricMachineLearning.BFGSCacheType
BFGSCache(B)

Make the cache for the BFGS optimizer based on the array B.

It stores an array for the gradient of the previous time step B and the inverse of the Hessian matrix H.

The cache for the inverse of the Hessian is initialized with the idendity. The cache for the previous gradient information is initialized with the zero vector.

Note that the cache for H is changed iteratively, whereas the cache for B is newly assigned at every time step.

source
GeometricMachineLearning.BatchType

Batch is a struct whose functor acts on an instance of DataLoader to produce a sequence of training samples for training for one epoch.

The Constructor

The constructor for Batch is called with:

  • batch_size::Int
  • seq_length::Int (optional)
  • prediction_window::Int (optional)

The first one of these arguments is required; it indicates the number of training samples in a batch. If we deal with time series data then we can additionaly supply a sequence length and a prediction window as input arguments to Batch. These indicate the number of input vectors and the number of output vectors.

The functor

An instance of Batch can be called on an instance of DataLoader to produce a sequence of samples that contain all the input data, i.e. for training for one epoch. The output of applying batch:Batch to dl::DataLoader is a tuple of vectors of integers. Each of these vectors contains two integers: the first is the time index and the second one is the parameter index.

source
GeometricMachineLearning.ClassificationType

Classification Layer that takes a matrix as an input and returns a vector that is used for MNIST classification.

It has the following arguments:

  • M: input dimension
  • N: output dimension
  • activation: the activation function

And the following optional argument:

  • average: If this is set to true, then the output is computed as $\frac{1}{N}\sum_{i=1}^N[input]_{\bullet{}i}$. If set to false (the default) it picks the last column of the input.
source
GeometricMachineLearning.ClassificationTransformerType

This is a transformer neural network for classification purposes. At the moment this is only used for training on MNIST, but can in theory be used for any classification problem.

It has to be called with a DataLoader that stores an input and an output tensor. The optional arguments are:

  • n_heads: The number of heads in the MultiHeadAttention (mha) layers. Default: 7.
  • n_layers: The number of transformer layers. Default: 16.
  • activation: The activation function. Default: softmax.
  • Stiefel: Wheter the matrices in the mha layers are on the Stiefel manifold.
  • add_connection: Whether the input is appended to the output of the mha layer. (skip connection)
source
GeometricMachineLearning.DataLoaderType
DataLoader(data)

Make an instance based on a data set.

This is designed such to make training convenient.

Constructor

The data loader can be called with various inputs:

  • A single vector: If the data loader is called with a single vector (and no other arguments are given), then this is interpreted as an autoencoder problem, i.e. the second axis indicates parameter values and/or time steps and the system has a single degree of freedom (i.e. the system dimension is one).
  • A single matrix: If the data loader is called with a single matrix (and no other arguments are given), then this is interpreted as an autoencoder problem, i.e. the first axis is assumed to indicate the degrees of freedom of the system and the second axis indicates parameter values and/or time steps.
  • A single tensor: If the data loader is called with a single tensor, then this is interpreted as an integration problem with the second axis indicating the time step and the third one indicating the parameters.
  • A tensor and a vector: This is a special case (MNIST classification problem). For the MNIST problem for example the input are $n_p$ matrices (first input argument) and $n_p$ integers (second input argument).
  • A NamedTuple with fields q and p: The NamedTuple contains (i) two matrices or (ii) two tensors.
  • An EnsembleSolution: The EnsembleSolution typically comes from GeometricProblems.

When we supply a single vector or a single matrix as input to DataLoader and further set autoencoder = false (keyword argument), then the data are stored as an integration problem and the second axis is assumed to indicate time steps.

Fields of DataLoader

The fields of the DataLoader struct are the following:

  • input: The input data with axes (i) system dimension, (ii) number of time steps and (iii) number of parameters.
  • output: The tensor that contains the output (supervised learning) - this may be of type Nothing if the constructor is only called with one tensor (unsupervised learning).
  • input_dim: The dimension of the system, i.e. what is taken as input by a regular neural network.
  • input_time_steps: The length of the entire time series (length of the second axis).
  • n_params: The number of parameters that are present in the data set (length of third axis)
  • output_dim: The dimension of the output tensor (first axis). If output is of type Nothing, then this is also of type Nothing.
  • output_time_steps: The size of the second axis of the output tensor. If output is of type Nothing, then this is also of type Nothing.

Implementation

Even though DataLoader can be called with inputs of various forms, internally it always stores tensors with three axes.

using GeometricMachineLearning

data = [1 2 3; 4 5 6]
dl = DataLoader(data)
dl.input

# output

[ Info: You have provided a matrix as input. The axes will be interpreted as (i) system dimension and (ii) number of parameters.
2×1×3 Array{Int64, 3}:
[:, :, 1] =
 1
 4

[:, :, 2] =
 2
 5

[:, :, 3] =
 3
 6
source
GeometricMachineLearning.DataLoaderMethod
DataLoader(data)

Make an instance based on a data set.

This is designed such to make training convenient.

Constructor

The data loader can be called with various inputs:

  • A single vector: If the data loader is called with a single vector (and no other arguments are given), then this is interpreted as an autoencoder problem, i.e. the second axis indicates parameter values and/or time steps and the system has a single degree of freedom (i.e. the system dimension is one).
  • A single matrix: If the data loader is called with a single matrix (and no other arguments are given), then this is interpreted as an autoencoder problem, i.e. the first axis is assumed to indicate the degrees of freedom of the system and the second axis indicates parameter values and/or time steps.
  • A single tensor: If the data loader is called with a single tensor, then this is interpreted as an integration problem with the second axis indicating the time step and the third one indicating the parameters.
  • A tensor and a vector: This is a special case (MNIST classification problem). For the MNIST problem for example the input are $n_p$ matrices (first input argument) and $n_p$ integers (second input argument).
  • A NamedTuple with fields q and p: The NamedTuple contains (i) two matrices or (ii) two tensors.
  • An EnsembleSolution: The EnsembleSolution typically comes from GeometricProblems.

When we supply a single vector or a single matrix as input to DataLoader and further set autoencoder = false (keyword argument), then the data are stored as an integration problem and the second axis is assumed to indicate time steps.

source
GeometricMachineLearning.DataLoaderMethod
DataLoader(data::AbstractMatrix)

Make an instance of DataLoader based on a matrix.

Arguments

This has an addition keyword argument autoencoder, which is set to true by default.

source
GeometricMachineLearning.DecoderType

Abstract Decoder type.

See

Implementation

If a custom <:Decoder architecture is implemented it should have the fields full_dim, reduced_dim and n_decoder_blocks.

source
GeometricMachineLearning.EncoderType

Abstract Encoder type.

See

Implementation

If a custom <:Encoder architecture is implemented it should have the fields full_dim, reduced_dim and n_encoder_blocks.

source
GeometricMachineLearning.GSympNetType
GSympNet(d)

Make a $G$-SympNet with dimension $d.$

There exists an additional constructor that can be called by supplying an instance of DataLoader.

Arguments

`Keyword arguments are:

  • upscaling_dimension::Int: The upscaling dimension of the gradient layer. See the documentation for GradientLayerQ and GradientLayerP for further explanation. The default is 2*dim.
  • n_layers::Int: The number of layers (i.e. the total number of GradientLayerQ and GradientLayerP). The default is 2.
  • activation: The activation function that is applied. By default this is tanh.
  • init_upper::Bool: Initialize the gradient layer so that it first modifies the $q$-component. The default is true.
source
GeometricMachineLearning.GlobalSectionType
GlobalSection(Y::AbstractMatrix)

Construct a global section for Y.

A global section $\lambda$ is a mapping from a homogeneous space $\mathcal{M}$ to the corresponding Lie group $G$ such that

\[\lambda(Y)E = Y,\]

Also see apply_section and global_rep.

Implementation

For an implementation of GlobalSection for a custom array (especially manifolds), the function global_section has to be generalized.

source
GeometricMachineLearning.GradientLayerPType
GradientLayerP(n, upscaling_dimension, activation)

Make an instance of a gradient-$p$ layer.

The gradient layer that changes the $p$ component. It is of the form:

\[\begin{bmatrix} \mathbb{I} & \mathbb{O} \\ \nabla{}V & \mathbb{I} \end{bmatrix},\]

with $V(p) = \sum_{i=1}^Ma_i\Sigma(\sum_jk_{ij}q_j+b_i)$, where $\Sigma$ is the antiderivative of the activation function $\sigma$ (one-layer neural network). We refer to $M$ as the upscaling dimension. Such layers are by construction symplectic.

source
GeometricMachineLearning.GradientLayerQType
GradientLayerQ(n, upscaling_dimension, activation)

Make an instance of a gradient-$q$ layer.

The gradient layer that changes the $q$ component. It is of the form:

\[\begin{bmatrix} \mathbb{I} & \nabla{}V \\ \mathbb{O} & \mathbb{I} \end{bmatrix},\]

with $V(p) = \sum_{i=1}^Ma_i\Sigma(\sum_jk_{ij}p_j+b_i)$, where $\Sigma$ is the antiderivative of the activation function $\sigma$ (one-layer neural network). We refer to $M$ as the upscaling dimension. Such layers are by construction symplectic.

source
GeometricMachineLearning.GradientOptimizerType
GradientOptimizer(η)

Make an instance of a gradient optimizer.

This is the simplest neural network optimizer. It has no cache and computes the final velocity as:

\[ \mathrm{velocity} \gets - \eta\nabla_\mathrm{weight}L.\]

Implementation

The operations are done as memory efficiently as possible. This means the provided $\nabla_WL$ is mutated via:

    rmul!(∇L, -method.η)
source
GeometricMachineLearning.GrassmannLieAlgHorMatrixType
GrassmannLieAlgHorMatrix(B::AbstractMatrix{T}, N::Integer, n::Integer) where T

Build an instance of GrassmannLieAlgHorMatrix based on an arbitrary matrix B of size $(N-n)\times{}n$.

GrassmannLieAlgHorMatrix is the horizontal component of the Lie algebra of skew-symmetric matrices (with respect to the canonical metric). The projection here is: $\pi:S \to SE/\sim$ where

\[E = \begin{pmatrix} \mathbb{I}_{n} \\ \mathbb{O}_{(N-n)\times{}n} \end{pmatrix},\]

and the equivalence relation is

\[V_1 \sim V_2 \iff \exists A\in\mathcal{S}_\mathrm{skew}(n) \text{such that } V_2 = V_1 + \begin{pmatrix} A \\ \mathbb{O} \end{pmatrix}\]

An element of GrassmannLieAlgMatrix takes the form:

\[\begin{pmatrix} \bar{\mathbb{O}} & B^T \\ B & \mathbb{O} \end{pmatrix},\]

where $\bar{\mathbb{O}}\in\mathbb{R}^{n\times{}n}$ and $\mathbb{O}\in\mathbb{R}^{(N - n)\times{}n}.$

source
GeometricMachineLearning.GrassmannLieAlgHorMatrixMethod
GrassmannLieAlgHorMatrix(D::AbstractMatrix, n::Integer)

Take a big matrix as input and build an instance of GrassmannLieAlgHorMatrix belonging to the GrassmannManifold $Gr(n, N)$ where $N$ is the number of rows of D.

If the constructor is called with a big $N\times{}N$ matrix, then the projection is performed the following way:

\[\begin{pmatrix} A & B_1 \\ B_2 & D \end{pmatrix} \mapsto \begin{pmatrix} \bar{\mathbb{O}} & -B_2^T \\ B_2 & \mathbb{O} \end{pmatrix}.\]

This can also be seen as the operation:

\[D \mapsto \Omega(E, DE - EE^TDE),\]

where $\Omega$ is the horizontal lift GeometricMachineLearning.Ω.

source
GeometricMachineLearning.HRedSysType

HRedSys computes the reconstructed dynamics in the full system based on the reduced one. Optionally it can be compared to the FOM solution.

It can be called using the following constructor: HRedSys(N, n; encoder, decoder, v_full, f_full, v_reduced, f_reduced, parameters, tspan, tstep, ics, projection_error) where

  • encoder: a function $\mathbb{R}^{2N}\mapsto{}\mathbb{R}^{2n}$
  • decoder: a (differentiable) function $\mathbb{R}^{2n}\mapsto\mathbb{R}^{2N}$
  • v_full: a (differentiable) mapping defined the same way as in GeometricIntegrators.
  • f_full: a (differentiable) mapping defined the same way as in GeometricIntegrators.
  • v_reduced: a (differentiable) mapping defined the same way as in GeometricIntegrators.
  • f_reduced: a (differentiable) mapping defined the same way as in GeometricIntegrators.
  • parameters: a NamedTuple that parametrizes the vector fields (the same for fullvectorfield and reducedvectorfield)
  • tspan: a tuple (t₀, tₗ) that specifies start and end point of the time interval over which integration is performed.
  • tstep: the time step
  • ics: the initial condition for the big system.
  • projection_error: the error $||M - \mathcal{R}\circ\mathcal{P}(M)||$ where $M$ is the snapshot matrix; $\mathcal{P}$ and $\mathcal{R}$ are the reduction and reconstruction respectively.
source
GeometricMachineLearning.LASympNetType
LASympNet(d)

Make an $LA$-SympNet with dimension $d.$

There exists an additional constructor that can be called by supplying an instance of DataLoader.

Arguments

Keyword arguments are:

  • depth::Int: The number of linear layers that are applied. The default is 5.
  • nhidden::Int: The number of hidden layers (i.e. layers that are not input or output layers). The default is 2.
  • activation: The activation function that is applied. By default this is tanh.
  • init_upper_linear::Bool: Initialize the linear layer so that it first modifies the $q$-component. The default is true.
  • init_upper_act::Bool: Initialize the activation layer so that it first modifies the $q$-component. The default is true.
source
GeometricMachineLearning.LinearLayerPType
LinearLayerP(n)

Make a linear layer of dimension $n\times{}n$ that only changes the $p$ component.

Equivalent to a left multiplication by the matrix:

\[\begin{pmatrix} \mathbb{I} & \mathbb{O} \\ B & \mathbb{I} \end{pmatrix}, \]

where $B$ is a symmetric matrix.

source
GeometricMachineLearning.LinearLayerQType
LinearLayerQ(n)

Make a linear layer of dimension $n\times{}n$ that only changes the $q$ component.

Equivalent to a left multiplication by the matrix:

\[\begin{pmatrix} \mathbb{I} & B \\ \mathbb{O} & \mathbb{I} \end{pmatrix}, \]

where $B$ is a symmetric matrix.

source
GeometricMachineLearning.LinearSymplecticAttentionType

Implements the linear symplectic attention layers. Analogous to GradientLayer it performs mappings that only change the $Q$ or the $P$ part. For more information see LinearSymplecticAttentionQ and LinearSymplecticAttentionP.

Constructor

For the constructors simply call

LinearSymplecticAttentionQ(sys_dim, seq_length)

or

LinearSymplecticAttentionP(sys_dim, seq_length)

where sys_dim is the system dimension and seq_length is the sequence length.

source
GeometricMachineLearning.LinearSymplecticTransformerType

Realizes the linear Symplectic Transformer.

Constructor:

The constructor is called with the following arguments

  1. dim::Int: System dimension
  2. seq_length::Int: Number of time steps that the transformer considers.

Optional keyword arguments:

  • n_sympnet::Int=2: The number of sympnet layers in the transformer.
  • upscaling_dimension::Int=2*dim: The upscaling that is done by the gradient layer.
  • L::Int=1: The number of transformer units.
  • activation=tanh: The activation function for the SympNet layers.
  • init_upper::Bool=true: Specifies if the first layer is a $Q$-type layer (init_upper=true) or if it is a $P$-type layer (init_upper=false).
source
GeometricMachineLearning.LowerTriangularType
LowerTriangular(S::AbstractVector, n::Int)

Build a lower-triangular matrix from a vector.

A lower-triangular matrix is an $n\times{}n$ matrix that has ones on the diagonal and zeros on the upper triangular.

The data are stored in a vector $S$ similarly to other matrices. See UpperTriangular, SkewSymMatrix and SymmetricMatrix.

The struct two fields: S and n. The first stores all the entries of the matrix in a sparse fashion (in a vector) and the second is the dimension $n$ for $A\in\mathbb{R}^{n\times{}n}$.

Examples

using GeometricMachineLearning
S = [1, 2, 3, 4, 5, 6]
LowerTriangular(S, 4)

# output

4×4 LowerTriangular{Int64, Vector{Int64}}:
 0  0  0  0
 1  0  0  0
 2  3  0  0
 4  5  6  0
source
GeometricMachineLearning.LowerTriangularMethod
LowerTriangular(A::AbstractMatrix)

Build a lower-triangular matrix from a matrix.

This is done by taking the lower left of that matrix.

Examples

using GeometricMachineLearning
M = [1 2 3 4; 5 6 7 8; 9 10 11 12; 13 14 15 16]
LowerTriangular(M)

# output

4×4 LowerTriangular{Int64, Vector{Int64}}:
  0   0   0  0
  5   0   0  0
  9  10   0  0
 13  14  15  0
source
GeometricMachineLearning.MomentumOptimizerType
MomentumOptimizer(η, α)

Make an instance of the momentum optimizer.

The momentum optimizer is similar to the GradientOptimizer. It however has a nontrivial cache that stores past history (see MomentumCache). The cache is updated via:

\[ B^{\mathrm{cache}} \gets \alpha{}B^{\mathrm{cache}} + \nabla_\mathrm{weights}L\]

and then the final velocity is computed as

\[ \mathrm{velocity} \gets - \eta{}B^{\mathrm{cache}}.\]

Or the riemannian manifold equivalent, if applicable.

Implementation

To save memory the velocity is stored in the input $\nabla_WL$. This is similar to the case of the GradientOptimizer.

source
GeometricMachineLearning.MultiHeadAttentionType

MultiHeadAttention (MHA) serves as a preprocessing step in the transformer. It reweights the input vectors bases on correlations within those data.

Constructor

Takes input arguments:

  • dim::Int: The system dimension
  • n_heads::Int: The number of heads.
  • Stiefel::Bool=true (keyword argument): whether the weights should be put on the Stiefel manifold.
  • retraction::AbstractRetraction (keyword argument): what kind of retraction should be used. By default this is the geodesic retraction.
  • add_connection::Bool=true (keyword argument): determines if the input should be added to the output for the final result.
source
GeometricMachineLearning.NetworkLossType

An abstract type for all the neural network losses. If you want to implement CustomLoss <: NetworkLoss you need to define a functor:

    (loss::CustomLoss)(model, ps, input, output)

where model is an instance of an AbstractExplicitLayer or a Chain and ps the parameters.

source
GeometricMachineLearning.OptimizerType
Optimizer(method, cache, step, retraction)

Store the method (e.g. AdamOptimizer with corresponding hyperparameters), the cache (e.g. AdamCache), the optimization step and the retraction.

It takes as input an optimization method and the parameters of a network.

For technical reasons we first specify an OptimizerMethod that stores all the hyperparameters of the optimizer.

source
GeometricMachineLearning.OptimizerMethod

A functor for Optimizer. It is called with: - nn::NeuralNetwork - dl::DataLoader - batch::Batch - n_epochs::Int - loss

The last argument is a function through which Zygote differentiates. This argument is optional; if it is not supplied GeometricMachineLearning defaults to an appropriate loss for the DataLoader.

source
GeometricMachineLearning.OptimizerMethod
Optimizer(method, nn::NeuralNetwork)

Allocate the cache for a specific method and a NeuralNetwork for an instance of Optimizer.

Internally this calls Optimizer(method, nn.params).

Typically the Optimizer is not initialized with the network parameters, but instead with a NeuralNetwork struct.

source
GeometricMachineLearning.PSDArchType

The architecture

Proper symplectic decomposition (PSD) can be seen as a SymplecticAutoencoder for which the decoder and the encoder are both PSD-like matrices (see the docs for PSDLayer.

Training

For optimizing the parameters in this architecture no neural network training is necessary (see the docs for solve!).

The constructor

The constructor only takes two arguments as input:

  • full_dim::Integer
  • reduced_dim::Integer
source
GeometricMachineLearning.PSDLayerType

This is a PSD-like layer used for symplectic autoencoders. One layer has the following shape:

\[A = \begin{bmatrix} \Phi & \mathbb{O} \\ \mathbb{O} & \Phi \end{bmatrix},\]

where $\Phi$ is an element of the Stiefel manifold $St(n, N)$.

The constructor of PSDLayer is called by PSDLayer(M, N; retraction=retraction):

  • M is the input dimension.
  • N is the output dimension.
  • retraction is an instance of a struct with supertype AbstractRetraction. The only options at the moment are Geodesic() and Cayley().
source
GeometricMachineLearning.ReducedLossType
ReducedLoss(encoder, decoder)

Make an instance of ReducedLoss based on an Encoder and a Decoder.

This loss should be used together with a NeuralNetworkIntegrator or TransformerIntegrator.

The loss is computed as:

\[\mathrm{loss}_{\mathcal{E}, \mathcal{D}}(\mathcal{NN}, \mathrm{input}, \mathrm{output}) = ||\mathcal{D}(\mathcal{NN}(\mathcal{E}(\mathrm{input}))) - \mathrm{output}||,\]

where $\mathcal{E}$ is the Encoder, $\mathcal{D}$ is the Decoder. $\mathcal{NN}$ is the neural network we compute the loss of.

source
GeometricMachineLearning.ResNetType

A ResNet is a neural network that realizes a mapping of the form: $x = \mathcal{NN}(x) + x$, so the input is again added to the output (a so-called add connection). In GeometricMachineLearning the specific ResNet that we use consists of a series of simple ResNetLayers.

source
GeometricMachineLearning.SkewSymMatrixType
SkewSymMatrix(S::AbstractVector, n::Integer)

Instantiate a skew-symmetric matrix with information stored in vector S.

A skew-symmetric matrix $A$ is a matrix $A^T = -A$.

Internally the struct saves a vector $S$ of size $n(n-1)\div2$. The conversion is done the following way:

\[[A]_{ij} = \begin{cases} 0 & \text{if $i=j$} \\ S[( (i-2) (i-1) ) \div 2 + j] & \text{if $i>j$}\\ S[( (j-2) (j-1) ) \div 2 + i] & \text{else}. \end{cases}\]

Also see SymmetricMatrix, LowerTriangular and UpperTriangular.

Examples

using GeometricMachineLearning
S = [1, 2, 3, 4, 5, 6]
SkewSymMatrix(S, 4)

# output

4×4 SkewSymMatrix{Int64, Vector{Int64}}:
 0  -1  -2  -4
 1   0  -3  -5
 2   3   0  -6
 4   5   6   0
source
GeometricMachineLearning.SkewSymMatrixMethod
SkewSymMatrix(A::AbstractMatrix)

Perform 0.5 * (A - A') and store the matrix in an efficient way (as a vector with $n(n-1)/2$ entries).

If the constructor is called with a matrix as input it returns a skew-symmetric matrix via the projection:

\[A \mapsto \frac{1}{2}(A - A^T).\]

Examples

using GeometricMachineLearning
M = [1 2 3 4; 5 6 7 8; 9 10 11 12; 13 14 15 16]
SkewSymMatrix(M)

# output

4×4 SkewSymMatrix{Float64, Vector{Float64}}:
 0.0  -1.5  -3.0  -4.5
 1.5   0.0  -1.5  -3.0
 3.0   1.5   0.0  -1.5
 4.5   3.0   1.5   0.0

Extend help

Note that the constructor is designed in such a way that it always returns matrices of type SkewSymMatrix{<:AbstractFloat} when called with a matrix, even if this matrix is of type AbstractMatrix{<:Integer}.

If the user wishes to allocate a matrix SkewSymMatrix{<:Integer} the constructor SkewSymMatrix(::AbstractVector, n::Integer) has to be called.

source
GeometricMachineLearning.StandardTransformerIntegratorType

The regular transformer used as an integrator (multi-step method).

The constructor is called with one argument:

  • sys_dim::Int

The following are keyword arguments:

  • transformer_dim::Int: the default is transformer_dim = sys_dim.
  • n_blocks::Int: The default is 1.
  • n_heads::Int: the number of heads in the multihead attentio layer (default is n_heads = sys_dim)
  • L::Int the number of transformer blocks (default is L = 2).
  • upscaling_activation: by default identity
  • resnet_activation: by default tanh
  • add_connection:Bool=true: if the input should be added to the output.
source
GeometricMachineLearning.StiefelLieAlgHorMatrixType
StiefelLieAlgHorMatrix(A::SkewSymMatrix{T}, B::AbstractMatrix{T}, N::Integer, n::Integer) where T

Build an instance of StiefelLieAlgHorMatrix based on a skew-symmetric matrix A and an arbitrary matrix B.

StiefelLieAlgHorMatrix is the horizontal component of the Lie algebra of skew-symmetric matrices (with respect to the canonical metric). The projection here is: $\pi:S \to SE$ where

\[E = \begin{pmatrix} \mathbb{I}_{n} \\ \mathbb{O}_{(N-n)\times{}n} \end{pmatrix}.\]

The matrix $E$ is implemented under StiefelProjection in GeometricMachineLearning.

An element of StiefelLieAlgMatrix takes the form:

\[\begin{pmatrix} A & B^T \\ B & \mathbb{O} \end{pmatrix},\]

where $A$ is skew-symmetric (this is SkewSymMatrix in GeometricMachineLearning).

Also see GrassmannLieAlgHorMatrix.

source
GeometricMachineLearning.StiefelLieAlgHorMatrixMethod
StiefelLieAlgHorMatrix(D::AbstractMatrix, n::Integer)

Take a big matrix as input and build an instance of StiefelLieAlgHorMatrix belonging to the StiefelManifold $St(n, N)$ where $N$ is the number of rows of D.

If the constructor is called with a big $N\times{}N$ matrix, then the projection is performed the following way:

\[\begin{pmatrix} A & B_1 \\ B_2 & D \end{pmatrix} \mapsto \begin{pmatrix} \mathrm{skew}(A) & -B_2^T \\ B_2 & \mathbb{O} \end{pmatrix}.\]

The operation $\mathrm{skew}:\mathbb{R}^{n\times{}n}\to\mathcal{S}_\mathrm{skew}(n)$ is the skew-symmetrization operation. This is equivalent to calling of SkewSymMatrix with an $n\times{}n$ matrix.

This can also be seen as the operation:

\[D \mapsto \Omega(E, DE) = \mathrm{skew}\left(2 \left(\mathbb{I} - \frac{1}{2} E E^T \right) DE E^T\right).\]

Also see GeometricMachineLearning.Ω.

source
GeometricMachineLearning.StiefelManifoldType

An implementation of the Stiefel manifold [7]. The Stiefel manifold is the collection of all matrices $Y\in\mathbb{R}^{N\times{}n}$ whose columns are orthonormal, i.e.

\[ St(n, N) = \{Y: Y^TY = \mathbb{I}_n \}.\]

The Stiefel manifold can be shown to have manifold structure (as the name suggests) and this is heavily used in GeometricMachineLearning. It is further a compact space. More information can be found in the docstrings for rgrad(::StiefelManifold, ::AbstractMatrix)andmetric(::StiefelManifold, ::AbstractMatrix, ::AbstractMatrix)`.

source
GeometricMachineLearning.StiefelProjectionType
StiefelProjection(backend, T, N, n)

Make a matrix of the form $\begin{bmatrix} \mathbb{I} & \mathbb{O} \end{bmatrix}^T$ for a specific backend and data type.

An array that essentially does vcat(I(n), zeros(N-n, n)) with GPU support.

Extend help

Technically this should be a subtype of StiefelManifold.

source
GeometricMachineLearning.StiefelProjectionMethod
StiefelProjection(A::AbstractMatrix)

Extract necessary information from A and build an instance of StiefelProjection.

Necessary information here referes to the backend, the data type and the size of the matrix.

source
GeometricMachineLearning.StiefelProjectionMethod
StiefelProjection(B::AbstractLieAlgHorMatrix)

Extract necessary information from B and build an instance of StiefelProjection.

Necessary information here referes to the backend, the data type and the size of the matrix.

The size is queried through B.N and B.n.

Examples

using GeometricMachineLearning

B₁ = rand(StiefelLieAlgHorMatrix, 5, 2)
B₂ = rand(GrassmannLieAlgHorMatrix, 5, 2)
E = [1. 0.; 0. 1.; 0. 0.; 0. 0.; 0. 0.]

StiefelProjection(B₁) ≈ StiefelProjection(B₂) ≈ E 

# output

true
source
GeometricMachineLearning.SymmetricMatrixType
SymmetricMatrix(S::AbstractVector, n::Integer)

Instantiate a symmetric matrix with information stored in vector S.

A SymmetricMatrix $A$ is a matrix $A^T = A$.

Internally the struct saves a vector $S$ of size $n(n+1)\div2$. The conversion is done the following way:

\[[A]_{ij} = \begin{cases} S[( (i-1) i ) \div 2 + j] & \text{if $i\geq{}j$}\\ S[( (j-1) j ) \div 2 + i] & \text{else}. \end{cases}\]

So $S$ stores a string of vectors taken from $A$: $S = [\tilde{a}_1, \tilde{a}_2, \ldots, \tilde{a}_n]$ with $\tilde{a}_i = [[A]_{i1},[A]_{i2},\ldots,[A]_{ii}]$.

Also see SkewSymMatrix, LowerTriangular and UpperTriangular.

Examples

using GeometricMachineLearning
S = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
SymmetricMatrix(S, 4)

# output

4×4 SymmetricMatrix{Int64, Vector{Int64}}:
 1  2  4   7
 2  3  5   8
 4  5  6   9
 7  8  9  10
source
GeometricMachineLearning.SymmetricMatrixMethod
SymmetricMatrix(A::AbstractMatrix)

Perform 0.5 * (A + A') and store the matrix in an efficient way (as a vector with $n(n+1)/2$ entries).

If the constructor is called with a matrix as input it returns a symmetric matrix via the projection:

\[A \mapsto \frac{1}{2}(A + A^T).\]

Examples

using GeometricMachineLearning
M = [1 2 3 4; 5 6 7 8; 9 10 11 12; 13 14 15 16]
SymmetricMatrix(M)

# output

4×4 SymmetricMatrix{Float64, Vector{Float64}}:
 1.0   3.5   6.0   8.5
 3.5   6.0   8.5  11.0
 6.0   8.5  11.0  13.5
 8.5  11.0  13.5  16.0

Extend help

Note that the constructor is designed in such a way that it always returns matrices of type SymmetricMatrix{<:AbstractFloat} when called with a matrix, even if this matrix is of type AbstractMatrix{<:Integer}.

If the user wishes to allocate a matrix SymmetricMatrix{<:Integer} the constructor SymmetricMatrix(::AbstractVector, n::Integer) has to be called.

source
GeometricMachineLearning.SympNetType

The SympNet type encompasses GSympNets and LASympNets. SympNets are universal approximators of symplectic flows, i.e. maps $\varphi:\mathbb{R}^{2n}\to\mathbb{R}^{2n}$ for which $(\nabla\varphi)^T\mathbb{J}\nabla\varphi = \mathbb{J}$ holds.

source
GeometricMachineLearning.SymplecticAutoencoderType
SymplecticAutoencoder(full_dim, reduced_dim)

Make an instance of SymplecticAutoencoder for dimensions full_dim and reduced_dim (those have to be provided as Integers).

The architecture

The symplectic autoencoder architecture was introduced in [40]. Like any other autoencoder it consists of an encoder $\Psi^e:\mathbb{R}^{2N}\to\mathbb{R}^{2n}$ and a decoder $\Psi^d:\mathbb{R}^{2n}\to\mathbb{R}^{2N}$ with $n\ll{}N$. These satisfy the following properties:

\[\nabla_z\Psi^e\mathbb{J}_{2N}(\nabla_z\Psi^e\mathbb{J}_{2N})^T = \mathbb{J}_{2n} \text{ and } (\nabla_\xi\Psi^d)^T\mathbb{J}_{2N}\nabla_\xi\Psi^d = \mathbb{J}_{2n}.\]

Because the decoder has this particular property, the reduced system can be described by the Hamiltonian $H\circ\Psi^d$:

\[\mathbb{J}_{2n}\nabla_\xi(H\circ\Psi^d) = \mathbb{J}_{2n}(\nabla_\xi\Psi^d)^T\nabla_{\Psi^d(\xi)}H = \mathbb{J}_{2n}(\nabla_\xi\Psi^d)^T\mathbb{J}_{2N}^T\mathbb{J}_{2N}\nabla_{\Psi^d(\xi)}H = (\nabla_\xi\Psi^d)^+X_H(\Psi^d(\xi)),\]

where $(\nabla_\xi\Psi^d)^+$ is the pseudoinverse of $\nabla_\xi\Psi^d$ (for more details see the docs on the AutoEncoder type).

Arguments

You can provide the following keyword arguments:

  • n_encoder_layers::Integer = 4: The number of layers in one encoder block.
  • n_encoder_blocks::Integer = 2: The number of encoder blocks.
  • n_decoder_layers::Integer = 1: The number of layers in one decoder block.
  • n_decoder_blocks::Integer = 3: The number of decoder blocks.
  • sympnet_upscale::Integer = 5: The upscaling dimension of the GSympNet. See GradientLayerQ and GradientLayerP.
  • activation = tanh: The activation in the gradient layers.
  • encoder_init_q::Bool = true: Specifies if the first layer in each encoder block should be of $q$ type.
  • decoder_init_q::Bool = true: Specifies if the first layer in each decoder block should be of $p$ type.
source
GeometricMachineLearning.TransformerLossType
TransformerLoss(seq_length, prediction_window)

Make an instance of the transformer loss.

This is the loss for a transformer network (especially a transformer integrator).

Parameters

The prediction_window specifies how many time steps are predicted into the future. It defaults to the value specified for seq_length.

source
GeometricMachineLearning.UpperTriangularType
LowerTriangular(S::AbstractVector, n::Int)

Build a lower-triangular matrix from a vector.

A lower-triangular matrix is an $n\times{}n$ matrix that has ones on the diagonal and zeros on the upper triangular.

The data are stored in a vector $S$ similarly to other matrices. See LowerTriangular, SkewSymMatrix and SymmetricMatrix.

The struct two fields: S and n. The first stores all the entries of the matrix in a sparse fashion (in a vector) and the second is the dimension $n$ for $A\in\mathbb{R}^{n\times{}n}$.

Examples

using GeometricMachineLearning
S = [1, 2, 3, 4, 5, 6]
UpperTriangular(S, 4)

# output

4×4 UpperTriangular{Int64, Vector{Int64}}:
 0  1  2  4
 0  0  3  5
 0  0  0  6
 0  0  0  0
source
GeometricMachineLearning.UpperTriangularMethod
UpperTriangular(A::AbstractMatrix)

Build a lower-triangular matrix from a matrix.

This is done by taking the lower left of that matrix.

Examples

using GeometricMachineLearning
M = [1 2 3 4; 5 6 7 8; 9 10 11 12; 13 14 15 16]
UpperTriangular(M)

# output

4×4 UpperTriangular{Int64, Vector{Int64}}:
 0  2  3   4
 0  0  7   8
 0  0  0  12
 0  0  0   0
source
GeometricMachineLearning.VolumePreservingAttentionType

Volume-preserving attention (single head attention)

Drawbacks:

  • the super fast activation is only implemented for sequence lengths of 2, 3, 4 and 5.
  • other sequence lengths only work on CPU for now (lu decomposition has to be implemented to work for tensors in parallel).

Constructor

The constructor is called with:

  • dim::Int: The system dimension
  • seq_length::Int: The sequence length to be considered. The default is zero, i.e. arbitrary sequence lengths; this works for all sequence lengths but doesn't apply the super-fast activation.
  • skew_sym::Bool (keyword argument): specifies if we the weight matrix is skew symmetric or arbitrary (default is false).

Functor

Applying a layer of type VolumePreservingAttention does the following:

  • First we perform the operation $X \mapsto X^T A X =: C$, where $X\in\mathbb{R}^{N\times\mathtt{seq\_length}}$ is a vector containing time series data and $A$ is the skew symmetric matrix associated with the layer.
  • In a second step we compute the Cayley transform of $C$; $\Lambda = \mathrm{Cayley}(C)$.
  • The output of the layer is then $X\Lambda$.
source
GeometricMachineLearning.VolumePreservingFeedForwardType

Realizes a volume-preserving neural network as a combination of VolumePreservingLowerLayer and VolumePreservingUpperLayer.

Constructor

The constructor is called with the following arguments:

  • sys_dim::Int: The system dimension.
  • n_blocks::Int: The number of blocks in the neural network (containing linear layers and nonlinear layers). Default is 1.
  • n_linear::Int: The number of linear VolumePreservingLowerLayers and VolumePreservingUpperLayers in one block. Default is 1.
  • activation: The activation function for the nonlinear layers in a block.
  • init_upper::Bool=false (keyword argument): Specifies if the first layer is lower or upper.
source
GeometricMachineLearning.VolumePreservingFeedForwardLayerType

Super-type of VolumePreservingLowerLayer and VolumePreservingUpperLayer. The layers do the following:

\[x \mapsto \begin{cases} \sigma(Lx + b) & \text{where $L$ is }\mathtt{LowerTriangular} \\ \sigma(Ux + b) & \text{where $U$ is }\mathtt{UpperTriangular}. \end{cases}\]

The functor can be applied to a vecotr, a matrix or a tensor.

Constructor

The constructors are called with:

  • sys_dim::Int: the system dimension.
  • activation=tanh: the activation function.
  • include_bias::Bool=true (keyword argument): specifies whether a bias should be used.
source
GeometricMachineLearning.VolumePreservingTransformerType

The volume-preserving transformer with the Cayley activation function and built-in upscaling.

Constructor

The arguments for the constructor are:

  1. sys_dim::Int
  2. seq_length::Int: The sequence length of the data fed into the transformer.

The following are keyword argumetns:

  • n_blocks::Int=1: The number of blocks in one transformer unit (containing linear layers and nonlinear layers). Default is 1.
  • n_linear::Int=1: The number of linear VolumePreservingLowerLayers and VolumePreservingUpperLayers in one block. Default is 1.
  • L::Int=1: The number of transformer units.
  • activation=tanh: The activation function.
  • init_upper::Bool=false: Specifies if the network first acts on the $q$ component.
  • skew_sym::Bool=false: specifies if we the weight matrix is skew symmetric or arbitrary.
source
AbstractNeuralNetworks.update!Method
update!(o, cache, dx::AbstractArray)

Update the cache based on the gradient information dx, compute the final velocity and store it in dx.

The optimizer o is needed because some updating schemes (such as AdamOptimizer) also need information on the current time step.

source
AbstractNeuralNetworks.update!Method
update!(o, cache, B)

First update the cache and then update the array B based on the optimizer o.

Note that $B\in\mathfrak{g}^\mathrm{hor}$ in general.

source
AbstractNeuralNetworks.update!Method
update!(o::Optimizer{<:BFGSOptimizer}, C, B)

Peform an update with the BFGS optimizer.

C is the cache, B contains the gradient information (the output of global_rep in general).

First we compute the final velocity with

    vecS = -o.method.η * C.H * vec(B)

and then we update H

    C.H .= (𝕀 - ρ * SY) * C.H * (𝕀 - ρ * SY') + ρ * vecS * vecS'

where SY is vecS * Y' and 𝕀 is the idendity.

Implementation

For stability we use δ for computing ρ:

    ρ = 1. / (vecS' * Y + o.method.δ)

This is similar to the AdamOptimizer

Extend Help

If we have weights on a Manifold than the updates are slightly more difficult. In this case the vec operation has to be generalized to the corresponding global tangent space.

source
Base.:*Method
*(λY, Y)

Apply the element λY onto Y.

Here λY is an element of a Lie group and Y is an element of a homogeneous space.

source
Base.iterateMethod

This function computes a trajectory for a Transformer that has already been trained for valuation purposes.

It takes as input:

  • nn: a NeuralNetwork (that has been trained).
  • ics: initial conditions (a matrix in $\mathbb{R}^{2n\times\mathtt{seq\_length}}$ or NamedTuple of two matrices in $\mathbb{R}^{n\times\mathtt{seq\_length}}$)
  • n_points::Int=100 (keyword argument): The number of steps for which we run the prediction.
  • prediction_window::Int=size(ics.q, 2): The prediction window (i.e. the number of steps we predict into the future) is equal to the sequence length (i.e. the number of input time steps) by default.
source
Base.iterateMethod

This function computes a trajectory for a SympNet that has already been trained for valuation purposes.

It takes as input:

  • nn: a NeuralNetwork (that has been trained).
  • ics: initial conditions (a NamedTuple of two vectors)
source
Base.randMethod
rand(backend::KernelAbstractions.Backend, manifold_type::Type{MT}, N::Integer, n::Integer) where MT <: Manifold)

Draw random elements for a specific device.

Examples

using GeometricMachineLearning
using GeometricMachineLearning: _round # hide
import Random
Random.seed!(123)

N, n = 5, 3
Y = rand(CPU(), StiefelManifold{Float32}, N, n)
_round(Y; digits = 5) # hide

# output

5×3 StiefelManifold{Float32, Matrix{Float32}}:
 -0.27575   0.32991   0.77275
 -0.62485  -0.33224  -0.0686
 -0.69333   0.36724  -0.18988
 -0.09295  -0.73145   0.46064
  0.2102    0.33301   0.38717

Random elements of the manifold can also be allocated on GPU, via e.g. ...

rand(CUDABackend(), StiefelManifold{Float32}, N, n)

... for drawing elements on a CUDA device.

source
Base.randMethod
rand(manifold_type::Type{MT}, N::Integer, n::Integer) where MT <: Manifold

Draw random elements from the Stiefel and the Grassmann manifold.

Because both of these manifolds are compact spaces we can sample them uniformly [8].

Examples

When we call ...

using GeometricMachineLearning
using GeometricMachineLearning: _round # hide
import Random
Random.seed!(123)

N, n = 5, 3
Y = rand(StiefelManifold{Float32}, N, n)
_round(Y; digits = 5) # hide

# output

5×3 StiefelManifold{Float32, Matrix{Float32}}:
 -0.27575   0.32991   0.77275
 -0.62485  -0.33224  -0.0686
 -0.69333   0.36724  -0.18988
 -0.09295  -0.73145   0.46064
  0.2102    0.33301   0.38717

... the sampling is done by first allocating a random matrix of size $N\times{}n$ via Y = randn(Float32, N, n). We then perform a QR decomposition Q, R = qr(Y) with the qr function from the LinearAlgebra package (this is using Householder reflections internally). The final output are then the first n columns of the Q matrix.

source
Base.vecMethod

If vec is applied onto Triangular, then the output is the associated vector.

source
Base.vecMethod

If vec is applied onto SkewSymMatrix, then the output is the associated vector.

source
GeometricMachineLearning.GradientFunction

This is an old constructor and will be depricated. For change_q=true it is equivalent to GradientLayerQ; for change_q=false it is equivalent to GradientLayerP.

If full_grad=false then ActivationLayer is called

source
GeometricMachineLearning.TransformerMethod

The architecture for a "transformer encoder" is essentially taken from arXiv:2010.11929, but with the difference that no layer normalization is employed. This is because we still need to find a generalization of layer normalization to manifolds.

The transformer is called with the following inputs:

  • dim: the dimension of the transformer
  • n_heads: the number of heads
  • L: the number of transformer blocks

In addition we have the following optional arguments:

  • activation: the activation function used for the ResNet (tanh by default)
  • Stiefel::Bool: if the matrices $P^V$, $P^Q$ and $P^K$ should live on a manifold (false by default)
  • retraction: which retraction should be used (Geodesic() by default)
  • add_connection::Bool: if the input should by added to the ouput after the MultiHeadAttention layer is used (true by default)
  • use_bias::Bool: If the ResNet should use a bias (true by default)
source
GeometricMachineLearning.apply_layer_to_nt_and_return_arrayMethod

This function is used in the wrappers where the input to the SympNet layers is not a NamedTuple (as it should be) but an AbstractArray.

It converts the Array to a NamedTuple (via assign_q_and_p), then calls the SympNet routine(s) and converts back to an AbstractArray (with vcat).

source
GeometricMachineLearning.apply_sectionMethod
apply_section(λY::GlobalSection{T, AT}, Y₂::AT) where {T, AT <: StiefelManifold{T}}

Apply λY to Y₂.

Mathematically this is the group action of the element $\lambda{}Y\in{}G$ on the element $Y_2$ of the homogeneous space $\mathcal{M}$.

Internally it calls the inplace version apply_section!.

source
GeometricMachineLearning.assign_batch_kernel!Method

Takes as input a batch tensor (to which the data are assigned), the whole data tensor and two vectors params and time_steps that include the specific parameters and time steps we want to assign.

Note that this assigns sequential data! For e.g. being processed by a transformer.

source
GeometricMachineLearning.assign_output_estimateMethod

The function assign_output_estimate is closely related to the transformer. It takes the last prediction_window columns of the output and uses them for the final prediction. i.e.

\[\mathbb{R}^{N\times\mathtt{pw}}\to\mathbb{R}^{N\times\mathtt{pw}}, \begin{bmatrix} z^{(1)}_1 & \cdots & z^{(T)}_1 \\ \cdots & \cdots & \cdots \\ z^{(1)}_n & \cdots & z^{(T})_n \end{bmatrix} \mapsto \begin{bmatrix} z^{(T - \mathtt{pw})}_1 & \cdots & z^{(T)}_1 \\ \cdots & \cdots & \cdots \\ z^{(T - \mathtt{pw})}_n & \cdots & z^{(T})_n\end{bmatrix} \]

source
GeometricMachineLearning.assign_q_and_pMethod

Allocates two new arrays q and p whose first dimension is half of that of the input x. This should also be supplied through the second argument N.

The output is a Tuple containing q and p.

source
GeometricMachineLearning.build_v_reducedMethod

Builds the reduced vector field based on the full vector field for a Hamiltonian system. We derive the reduced vector field via the reduced Hamiltonian: $\tilde{H} := H\circ\Psi^\mathrm{dec}$. We then get

\[\mathbb{J}_{2n}\nabla_\xi\tilde{H} = \mathbb{J}_{2n}(\nabla\Psi^\mathrm{dec})^T\mathbb{J}_{2N}^T\mathbb{J}_{2N}\nabla_z{}H = \mathbb{J}_{2n}(\nabla\Psi^\mathrm{dec})^T\mathbb{J}_{2N}^T \begin{pmatrix} v(z) \\ f(z) \end{pmatrix} = \begin{pmatrix} - (\nabla_p\Psi_q)^Tf(z) + (\nabla_p\Psi_p)^Tv(z) \\ (\nabla_q\Psi_q)^Tf(z) - (\nabla_q\Psi_p)^Tv(z) \end{pmatrix}.\]

source
GeometricMachineLearning.cayleyMethod
cayley(B::StiefelLieAlgHorMatrix)

Compute the Cayley retraction of B and multiply it with E (the distinct element of the Stiefel manifold).

source
GeometricMachineLearning.cayleyMethod
cayley(Y::Manifold, Δ)

Take as input an element of a manifold Y and a tangent vector in Δ in the corresponding tangent space and compute the Cayley retraction.

In different notation: take as input an element $x$ of $\mathcal{M}$ and an element of $T_x\mathcal{M}$ and return $\mathrm{Cayley}(v_x).$ For example:

Y = rand(StiefelManifold{Float64}, N, n)
Δ = rgrad(Y, rand(N, n))
cayley(Y, Δ)

See the docstring for rgrad for details on this function.

source
GeometricMachineLearning.geodesicMethod
geodesic(Y::Manifold, Δ)

Take as input an element of a manifold Y and a tangent vector in Δ in the corresponding tangent space and compute the geodesic (exponential map).

In different notation: take as input an element $x$ of $\mathcal{M}$ and an element of $T_x\mathcal{M}$ and return $\mathtt{geodesic}(x, v_x) = \exp(v_x).$ For example:

Y = rand(StiefelManifold{Float64}, N, n)
Δ = rgrad(Y, rand(N, n))
geodesic(Y, Δ)

See the docstring for rgrad for details on this function.

source
GeometricMachineLearning.global_repMethod
global_rep(λY::GlobalSection{T, AT}, Δ::AbstractMatrix{T}) where {T, AT<:GrassmannManifold{T}}

Express Δ (an the tangent space of Y) as an instance of GrassmannLieAlgHorMatrix.

The method global_rep for GrassmannManifold is similar to that for StiefelManifold.

Examples

using GeometricMachineLearning
using GeometricMachineLearning: _round
import Random 

Random.seed!(123)

Y = rand(GrassmannManifold, 6, 3)
Δ = rgrad(Y, randn(6, 3))
λY = GlobalSection(Y)

_round(global_rep(λY, Δ); digits = 3)

# output

6×6 GrassmannLieAlgHorMatrix{Float64, Matrix{Float64}}:
  0.0     0.0     0.0     0.981  -2.058   0.4
  0.0     0.0     0.0    -0.424   0.733  -0.919
  0.0     0.0     0.0    -1.815   1.409   1.085
 -0.981   0.424   1.815   0.0     0.0     0.0
  2.058  -0.733  -1.409   0.0     0.0     0.0
 -0.4     0.919  -1.085   0.0     0.0     0.0
source
GeometricMachineLearning.global_repMethod
global_rep(λY::GlobalSection{T, AT}, Δ::AbstractMatrix{T}) where {T, AT<:StiefelManifold{T}}

Express Δ (an the tangent space of Y) as an instance of StiefelLieAlgHorMatrix.

This maps an element from $T_Y\mathcal{M}$ to an element of $\mathfrak{g}^\mathrm{hor}$.

These two spaces are isomorphic where the isomorphism where the isomorphism is established through $\lambda(Y)\in{}G$ via:

\[T_Y\mathcal{M} \to \mathfrak{g}^{\mathrm{hor}}, \Delta \mapsto \lambda(Y)^{-1}\Omega(Y, \Delta)\lambda(Y).\]

Also see GeometricMachineLearning.Ω.

Examples

using GeometricMachineLearning
using GeometricMachineLearning: _round
import Random 

Random.seed!(123)

Y = rand(StiefelManifold, 6, 3)
Δ = rgrad(Y, randn(6, 3))
λY = GlobalSection(Y)

_round(global_rep(λY, Δ); digits = 3)

# output

6×6 StiefelLieAlgHorMatrix{Float64, SkewSymMatrix{Float64, Vector{Float64}}, Matrix{Float64}}:
  0.0     0.679   1.925   0.981  -2.058   0.4
 -0.679   0.0     0.298  -0.424   0.733  -0.919
 -1.925  -0.298   0.0    -1.815   1.409   1.085
 -0.981   0.424   1.815   0.0     0.0     0.0
  2.058  -0.733  -1.409   0.0     0.0     0.0
 -0.4     0.919  -1.085   0.0     0.0     0.0

Implementation

The function global_rep does in fact not perform the entire map $\lambda(Y)^{-1}\Omega(Y, \Delta)\lambda(Y)$ but only

\[\Delta \mapsto \mathrm{skew}(Y^T\Delta),\]

to get the small skew-symmetric matrix and

\[\Delta \mapsto (\lambda(Y)_{[1:N, n:N]}^T \Delta)_{[1:(N-n), 1:n]},\]

for the arbitrary matrix.

source
GeometricMachineLearning.global_sectionMethod
global_section(Y::StiefelManifold)

Compute a matrix of size $N\times(N-n)$ whose columns are orthogonal to the columns in Y.

This matrix is also called $Y_\perp$ [6, 10, 11].

Examples

using GeometricMachineLearning
using GeometricMachineLearning: global_section
import Random

Random.seed!(123)

Y = StiefelManifold([1. 0.; 0. 1.; 0. 0.; 0. 0.])

round.(Matrix(global_section(Y)); digits = 3)

# output

4×2 Matrix{Float64}:
 0.0    -0.0
 0.0     0.0
 0.936  -0.353
 0.353   0.936

Further note that we convert the QRCompactWYQ object to a Matrix before we display it.

Implementation

The implementation is done with a QR decomposition (LinearAlgebra.qr!). Internally we do:

A = randn(N, N - n) # or the gpu equivalent
A = A - Y.A * (Y.A' * A)
qr!(A).Q
source
GeometricMachineLearning.init_optimizer_cacheMethod
init_optimizer_cache(method, x)

Initialize the cache corresponding to the weights x for a specific method.

Implementation

Wrapper for the functions setup_adam_cache, setup_momentum_cache, setup_gradient_cache, setup_bfgs_cache. These appear outside of optimizer_caches.jl because the OptimizerMethods first have to be defined.

source
GeometricMachineLearning.mat_tensor_mul!Method
mat_tensor_mul!(C::AbstractArray{T, 3}, A::LowerTriangular{T}, B::AbstractArray{T, 3}) where T

Multiply the lower-triangular matrix A onto the tensor B from the left and store the result in C.

Also checks the bounds of the input arrays.

This performs an efficient multiplication based on the special structure of the lower-triangular matrix A.

source
GeometricMachineLearning.mat_tensor_mul!Method
mat_tensor_mul!(C::AbstractArray{T, 3}, A::SkewSymMatrix{T}, B::AbstractArray{T, 3}) where T

Multiply skew-symmetric the matrix A onto the tensor B from the left and store the result in C.

Also checks the bounds of the input arrays.

This performs an efficient multiplication based on the special structure of the skew-symmetric matrix A.

source
GeometricMachineLearning.mat_tensor_mul!Method
mat_tensor_mul!(C::AbstractArray{T, 3}, A::SymmetricMatrix{T}, B::AbstractArray{T, 3}) where T

Multiply the symmetric matrix A onto the tensor B from the left and store the result in C.

Also checks the bounds of the input arrays.

This performs an efficient multiplication based on the special structure of the symmetric matrix A.

source
GeometricMachineLearning.mat_tensor_mul!Method
mat_tensor_mul!(C::AbstractArray{T, 3}, A::UpperTriangular{T}, B::AbstractArray{T, 3}) where T

Multiply the upper-triangular matrix A onto the tensor B from the left and store the result in C.

Also checks the bounds of the input arrays.

This performs an efficient multiplication based on the special structure of the upper-triangular matrix A.

source
GeometricMachineLearning.mat_tensor_mulMethod
mat_tensor_mul(A::AbstractMatrix{T}, B::AbstractArray{T, 3}) where T

Multipliy the matrix A onto the tensor B from the left.

Internally this calls the inplace version mat_tensor_mul!.

Examples

using GeometricMachineLearning: mat_tensor_mul

B = [1 1 1; 1 1 1; 1 1 1;;; 2 2 2; 2 2 2; 2 2 2]
A = [3 0 0; 0 2 0; 0 0 1]

mat_tensor_mul(A, B)

# output

3×3×2 Array{Int64, 3}:
[:, :, 1] =
 3  3  3
 2  2  2
 1  1  1

[:, :, 2] =
 6  6  6
 4  4  4
 2  2  2
source
GeometricMachineLearning.metricMethod
metric(Y::GrassmannManifold, Δ₁::AbstractMatrix, Δ₂::AbstractMatrix)

Compute the metric for vectors Δ₁ and Δ₂ at Y.

The representation of the Grassmann manifold is realized as a quotient space of the Stiefel manifold.

The metric for the Grassmann manifold is:

\[g^{Gr}_Y(\Delta_1, \Delta_2) = g^{St}_Y(\Delta_1, \Delta_2) = \mathrm{Tr}(\Delta_1^T (\mathbb{I} - Y Y^T) \Delta_2) = \mathrm{Tr}(\Delta_1^T \Delta_2).\]

source
GeometricMachineLearning.metricMethod

Implements the canonical Riemannian metric for the Stiefel manifold:

\[g_Y: (\Delta_1, \Delta_2) \mapsto \mathrm{tr}(\Delta_1^T(\mathbb{I} - \frac{1}{2}YY^T)\Delta_2).\]

It is called with:

  • Y::StiefelManifold
  • Δ₁::AbstractMatrix
  • Δ₂::AbstractMatrix
source
GeometricMachineLearning.onehotbatchMethod

One-hot-batch encoding of a vector of integers: $input\in\{0,1,\ldots,9\}^\ell$. The output is a tensor of shape $10\times1\times\ell$.

\[0 \mapsto \begin{bmatrix} 1 & 0 & \ldots & 0 \end{bmatrix}.\]

In more abstract terms: $i \mapsto e_i$.

source
GeometricMachineLearning.optimization_step!Method
optimization_step!(o, λY, ps, cache, dx)

Update the weights ps of a layer based on an Optimizer, a cache and first-order derivatives dx.

The derivatives dx here are usually obtained via an AD routine by differentiating a loss function, i.e. dx is $\nabla_xL$.

It is calling the function update! internally which has to be implemented for every OptimizerMethod.

source
GeometricMachineLearning.optimize_for_one_epoch!Method

Optimize for an entire epoch. For this you have to supply:

  • an instance of the optimizer.
  • the neural network model
  • the parameters of the model
  • the data (in form of DataLoader)
  • in instance of Batch that contains batch_size (and optionally seq_length)

With the optional argument:

  • the loss, which takes the model, the parameters ps and an instance of DataLoader as input.

The output of optimize_for_one_epoch! is the average loss over all batches of the epoch:

\[output = \frac{1}{\mathtt{steps\_per\_epoch}}\sum_{t=1}^\mathtt{steps\_per\_epoch}loss(\theta^{(t-1)}).\]

This is done because any reverse differentiation routine always has two outputs: a pullback and the value of the function it is differentiating. In the case of zygote: loss_value, pullback = Zygote.pullback(ps -> loss(ps), ps) (if the loss only depends on the parameters).

source
GeometricMachineLearning.rgradMethod
rgrad(Y::GrassmannManifold, e_grad::AbstractMatrix)

Compute the Riemannian gradient at $Y\in{}Gr(n, N)$.

These gradient have the property that they are orthogonal to the space spanned by $Y$.

The precise form of the mapping is:

\[\mathtt{rgrad}(Y, \nabla{}L) \mapsto \nabla{}L - YY^T\nabla{}L\]

Note the property $Y^T\mathrm{rgrad}(Y, \nabla{}L) = \mathbb{O}.$

Also see rgrad(::StiefelManifold, ::AbstractMatrix).

Examples

using GeometricMachineLearning

Y = GrassmannManifold([1 0 ; 0 1 ; 0 0; 0 0])
Δ = [1 2; 3 4; 5 6; 7 8]
rgrad(Y, Δ)

# output

4×2 Matrix{Int64}:
 0  0
 0  0
 5  6
 7  8
source
GeometricMachineLearning.rgradMethod
rgrad(Y::StiefelManifold, e_grad::AbstractMatrix)

Compute the Riemannian gradient for the Stiefel manifold at $Y\in{}St(N,n)$ based on $\nabla{}L\in\mathbb{R}^{N\times{}n}$ (the Euclidean gradient).

The function computes the Riemannian gradient with respect to the canonical metric.

The precise form of the mapping is:

\[\mathtt{rgrad}(Y, \nabla{}L) \mapsto \nabla{}L - Y(\nabla{}L)^TY\]

Note the property $Y^T\mathrm{rgrad}(Y, \nabla{}L)\in\mathcal{S}_\mathrm{skew}(n).$

Examples

using GeometricMachineLearning

Y = StiefelManifold([1 0 ; 0 1 ; 0 0; 0 0])
Δ = [1 2; 3 4; 5 6; 7 8]
rgrad(Y, Δ)

# output

4×2 Matrix{Int64}:
 0  -1
 1   0
 5   6
 7   8
source
GeometricMachineLearning.split_and_flattenMethod

split_and_flatten takes a tensor as input and produces another one as output (essentially rearranges the input data in an intricate way) so that it can easily be processed with a transformer.

The optional arguments are:

  • patch_length: by default this is 7.
  • number_of_patches: by default this is 16.
source
GeometricMachineLearning.tensor_mat_mul!Method
mat_tensor_mul!(C::AbstractArray{T, 3}, B::AbstractArray{T, 3}, A::SymmetricMatrix{T}) where T

Multiply the symmetric matrix A onto the tensor B from the right and store the result in C.

Also checks the bounds of the input arrays.

This performs an efficient multiplication based on the special structure of the symmetric matrix A.

source
GeometricMachineLearning.tensor_mat_mulMethod
tensor_mat_mul(A::AbstractArray{T, 3}, B::AbstractArray{T}) where T

Multipliy the matrix B onto the tensor A from the right.

Internally this calls the inplace version tensor_mat_mul!.

Examples

using GeometricMachineLearning: tensor_mat_mul

A = [1 1 1; 1 1 1; 1 1 1;;; 2 2 2; 2 2 2; 2 2 2]
B = [3 0 0; 0 2 0; 0 0 1]

tensor_mat_mul(A, B)

# output

3×3×2 Array{Int64, 3}:
[:, :, 1] =
 3  2  1
 3  2  1
 3  2  1

[:, :, 2] =
 6  4  2
 6  4  2
 6  4  2
source
GeometricMachineLearning.tensor_mat_skew_sym_assignMethod

Takes as input:

  • Z::AbstractArray{T, 3}: A tensor that stores a bunch of time series.
  • A::AbstractMatrix: A matrix that is used to perform various scalar products.

For one of these time series the function performs the following computation:

\[ (z^{(i)}, z^{(j)}) \mapsto (z^{(i)})^TAz^{(j)} \text{ for } i > j.\]

The result of this are $n(n-2)\div2$ scalar products. These scalar products are written into a lower-triangular matrix and the final output of the function is a tensor of these lower-triangular matrices.

source
GeometricMachineLearning.train!Function
train!(...)

Perform a training of a neural networks on data using given method a training Method

Different ways of use:

train!(neuralnetwork, data, optimizer = GradientOptimizer(1e-2), training_method; nruns = 1000, batch_size = default(data, type), showprogress = false )

Arguments

  • neuralnetwork::LuxNeuralNetwork : the neural net work using LuxBackend
  • data : the data (see TrainingData)
  • optimizer = GradientOptimizer: the optimization method (see Optimizer)
  • training_method : specify the loss function used
  • nruns : number of iteration through the process with default value
  • batch_size : size of batch of data used for each step
source
GeometricMachineLearning.train!Method
train!(neuralnetwork, data, optimizer, training_method; nruns = 1000, batch_size, showprogress = false )

Arguments

  • neuralnetwork::LuxNeuralNetwork : the neural net work using LuxBackend
  • data::AbstractTrainingData : the data
  • ``
source
GeometricMachineLearning.ΩMethod
Ω(Y::GrassmannManifold{T}, Δ::AbstractMatrix{T}) where T

Perform the canonical horizontal lift for the Grassmann manifold:

\[ \Delta \mapsto \Omega^{St}(Y, Δ),\]

where $\Omega^{St}$ is the canonical horizontal lift for the Stiefel manifold.

using GeometricMachineLearning
E = GrassmannManifold(StiefelProjection(5, 2))
Δ = [0. 0.; 0. 0.; 2. 3.; 4. 5.; 6. 7.]
GeometricMachineLearning.Ω(E, Δ)

# output

5×5 SkewSymMatrix{Float64, Vector{Float64}}:
 0.0  -0.0  -2.0  -4.0  -6.0
 0.0   0.0  -3.0  -5.0  -7.0
 2.0   3.0   0.0  -0.0  -0.0
 4.0   5.0   0.0   0.0  -0.0
 6.0   7.0   0.0   0.0   0.0
source
GeometricMachineLearning.ΩMethod
Ω(Y::StiefelManifold{T}, Δ::AbstractMatrix{T}) where T

Perform canonical horizontal lift for the Stiefel manifold:

\[ \Delta \mapsto (\mathbb{I} - \frac{1}{2}YY^T)\Delta{}Y^T - Y\Delta^T(\mathbb{I} - \frac{1}{2}YY^T).\]

Internally this performs

SkewSymMatrix(2 * (I(n) - .5 * Y * Y') * Δ * Y')

to save memory.

Examples

using GeometricMachineLearning
E = StiefelManifold(StiefelProjection(5, 2))
Δ = [0. -1.; 1. 0.; 2. 3.; 4. 5.; 6. 7.]
GeometricMachineLearning.Ω(E, Δ)

# output

5×5 SkewSymMatrix{Float64, Vector{Float64}}:
 0.0  -1.0  -2.0  -4.0  -6.0
 1.0   0.0  -3.0  -5.0  -7.0
 2.0   3.0   0.0  -0.0  -0.0
 4.0   5.0   0.0   0.0  -0.0
 6.0   7.0   0.0   0.0   0.0

Note that the output of Ω is a skew-symmetric matrix, i.e. an element of $\mathfrak{g}$.

source
GeometricMachineLearning.𝔄Method
𝔄(A)

Compute $\mathfrak{A}(A) := \sum_{n=1}^\infty \frac{1}{n!} (A)^{n-1}.$

Implementation

This uses a Taylor expansion that iteratively adds terms with

while norm(Aⁿ) > ε
    mul!(A_temp, Aⁿ, A)
    Aⁿ .= A_temp
    rmul!(Aⁿ, inv(n))

    𝔄 += B
    n += 1 
end

until the norm of Aⁿ becomes smaller than machine precision. The counter n in the above algorithm is initialized as 2 The matrices Aⁿ and 𝔄 are initialized as the identity matrix.

source
GeometricMachineLearning.𝔄Method
𝔄(B̂, B̄)

Compute $\mathfrak{A}(B', B'') := \sum_{n=1}^\infty \frac{1}{n!} ((B'')^TB')^{n-1}.$

This expression has the property $\mathbb{I} + B'\mathfrak{A}(B', B'')(B'')^T = \exp(B'(B'')^T).$

Examples

using GeometricMachineLearning
using GeometricMachineLearning: 𝔄
import Random
Random.seed!(123)

B = rand(StiefelLieAlgHorMatrix, 10, 2)
B̂ = hcat(vcat(.5 * B.A, B.B), vcat(one(B.A), zero(B.B)))
B̄ = hcat(vcat(one(B.A), zero(B.B)), vcat(-.5 * B.A, -B.B))

one(B̂ * B̄') + B̂ * 𝔄(B̂, B̄) * B̄' ≈ exp(Matrix(B))

# output

true
source