GeometricMachineLearning Library Functions

GeometricMachineLearning.QPTOAT — Type

A union of two types:

const QPTOAT = Union{QPT, AbstractArray}

source

AbstractNeuralNetworks.Chain — Method

Chain can also be called with a neural network as input.

source

AbstractNeuralNetworks.Chain — Method

Build a chain for an LASympnet for which init_upper_linear is false and init_upper_act is false.

source

AbstractNeuralNetworks.Chain — Method

Build a chain for an LASympnet for which init_upper_linear is false and init_upper_act is true.

source

AbstractNeuralNetworks.Chain — Method

Build a chain for an LASympnet for which init_upper_linear is true and init_upper_act is false.

source

AbstractNeuralNetworks.Chain — Method

Build a chain for an LASympnet for which init_upper_linear is true and init_upper_act is true.

source

Base.Matrix — Method

Matrix(λY::GlobalSection)

Put λY into matrix form.

This is not recommended if speed is important!

Use apply_section and global_rep instead!

source

GeometricMachineLearning.AbstractCache — Type

AbstractCache has subtypes:

AdamCache
MomentumCache
GradientCache
BFGSCache

All of them can be initialized with providing an array (also supporting manifold types).

source

GeometricMachineLearning.AbstractLieAlgHorMatrix — Type

AbstractLieAlgHorMatrix is a supertype for various horizontal components of Lie algebras. We usually call this $\mathfrak{g}^\mathrm{hor}$.

See StiefelLieAlgHorMatrix and GrassmannLieAlgHorMatrix.

source

GeometricMachineLearning.AbstractRetraction — Type

AbstractRetraction is a type that comprises all retraction methods for manifolds. For every manifold layer one has to specify a retraction method that takes the layer and elements of the (global) tangent space.

source

GeometricMachineLearning.AbstractTriangular — Type

See UpperTriangular and LowerTriangular.

source

GeometricMachineLearning.ActivationLayer — Type

ActivationLayer is the struct corresponding to the constructors ActivationLayerQ and ActivationLayerP. See those for more information.

source

GeometricMachineLearning.ActivationLayerP — Type

ActivationLayerP(n, σ)

Make an activation layer of size $n$ and with activation $σ$ that only changes the $p$ component.

Performs:

\[\begin{pmatrix} q \\ p \end{pmatrix} \mapsto \begin{pmatrix} q \\ p + \mathrm{diag}(a)\sigma(q) \end{pmatrix}.\]

source

GeometricMachineLearning.ActivationLayerQ — Type

ActivationLayerQ(n, σ)

Make an activation layer of size $n$ and with activation $σ$ that only changes the $q$ component.

Performs:

\[\begin{pmatrix} q \\ p \end{pmatrix} \mapsto \begin{pmatrix} q + \mathrm{diag}(a)\sigma(p) \\ p \end{pmatrix}.\]

source

GeometricMachineLearning.AdamCache — Type

AdamCache(Y)

Store the first and second moment for Y (initialized as zeros).

First and second moments are called B₁ and B₂.

If the cache is called with an instance of a homogeneous space, e.g. the StiefelManifold $St(n,N)$ it initializes the moments as elements of $\mathfrak{g}^\mathrm{hor}$ (StiefelLieAlgHorMatrix).

Examples

using GeometricMachineLearning

Y = rand(StiefelManifold, 5, 3)
AdamCache(Y).B₁

# output

5×5 StiefelLieAlgHorMatrix{Float64, SkewSymMatrix{Float64, Vector{Float64}}, Matrix{Float64}}:
 0.0  -0.0  -0.0  -0.0  -0.0
 0.0   0.0  -0.0  -0.0  -0.0
 0.0   0.0   0.0  -0.0  -0.0
 0.0   0.0   0.0   0.0   0.0
 0.0   0.0   0.0   0.0   0.0

source

GeometricMachineLearning.AdamOptimizer — Type

AdamOptimizer(η, ρ₁, ρ₂, δ)

Make an instance of the Adam Optimizer.

Here the cache consists of first and second moments that are updated as

\[B_1 \gets ((\rho_1 - \rho_1^t)/(1 - \rho_1^t))\cdot{}B_1 + (1 - \rho_1)/(1 - \rho_1^t)\cdot{}\nabla{}L,\]

and

\[B_2 \gets ((\rho_2 - \rho_1^t)/(1 - \rho_2^t))\cdot{}B_2 + (1 - \rho_2)/(1 - \rho_2^t)\cdot\nabla{}L\odot\nabla{}L.\]

The final velocity is computed as:

\[\mathrm{velocity} \gets -\eta{}B_1/\sqrt{B_2 + \delta}.\]

Implementation

The velocity is stored in the input to save memory.

Algorithm and suggested defaults are taken from [20, page 301].

source

GeometricMachineLearning.AdamOptimizerWithDecay — Type

AdamOptimizerWithDecay(n_epochs, η₁=1f-2, η₂=1f-6, ρ₁=9f-1, ρ₂=9.9f-1, δ=1f-8)

Make an instance of the Adam Optimizer with weight decay.

All except the first argument (the number of epochs) have defaults.

The difference to the standard AdamOptimizer is that we change the learning reate $\eta$ in each step. Apart from the time dependency of $\eta$ the two algorithms are however equivalent! $\eta(0)$ starts with a high value $\eta_1$ and then exponentially decrease until it reaches $\eta_2$ with

\[ \eta(t) = \gamma^t\eta_1,\]

where $\gamma = \exp(\log(\eta_1 / \eta_2) / \mathtt{n\_epochs}).$

source

GeometricMachineLearning.AutoEncoder — Type

An autoencoder [20] is a neural network consisting of an encoder $\Psi^e$ and a decoder $\Psi^d$. In the simplest case they are trained on some data set $\mathcal{D}$ to reduce the following error:

\[||\Psi^d\circ\Psi^e(\mathcal{D}) - \mathcal{D}||,\]

which we call the reconstruction error or autoencoder error (see the docs for AutoEncoderLoss) and $||\cdot||$ is some norm.

Implementation

Abstract AutoEncoder type. If a custom <:AutoEncoder architecture is implemented it should have the fields full_dim, reduced_dim, n_encoder_blocks and n_decoder_blocks. Further the routines encoder, decoder, encoder_parameters and decoder_parameters should be extended.

source

GeometricMachineLearning.AutoEncoderLoss — Type

This loss should always be used together with a neural network of type AutoEncoder (and it is also the default for training such a network).

It simply computes:

\[\mathtt{AutoEncoderLoss}(nn\mathtt{::Loss}, input) = ||nn(input) - input||.\]

source

GeometricMachineLearning.BFGSCache — Type

BFGSCache(B)

Make the cache for the BFGS optimizer based on the array B.

It stores an array for the gradient of the previous time step B and the inverse of the Hessian matrix H.

The cache for the inverse of the Hessian is initialized with the idendity. The cache for the previous gradient information is initialized with the zero vector.

Note that the cache for H is changed iteratively, whereas the cache for B is newly assigned at every time step.

source

GeometricMachineLearning.BFGSOptimizer — Type

BFGSOptimizer(η, δ)

Make an instance of the Broyden-Fletcher-Goldfarb-Shanno (BFGS) optimizer.

η is the learning rate. δ is a stabilization parameter.

source

GeometricMachineLearning.Batch — Type

Batch is a struct whose functor acts on an instance of DataLoader to produce a sequence of training samples for training for one epoch.

The Constructor

The constructor for Batch is called with:

batch_size::Int
seq_length::Int (optional)
prediction_window::Int (optional)

The first one of these arguments is required; it indicates the number of training samples in a batch. If we deal with time series data then we can additionaly supply a sequence length and a prediction window as input arguments to Batch. These indicate the number of input vectors and the number of output vectors.

The functor

An instance of Batch can be called on an instance of DataLoader to produce a sequence of samples that contain all the input data, i.e. for training for one epoch. The output of applying batch:Batch to dl::DataLoader is a tuple of vectors of integers. Each of these vectors contains two integers: the first is the time index and the second one is the parameter index.

source

GeometricMachineLearning.BiasLayer — Type

A bias layer that does nothing more than add a vector to the input. This is needed for LA-SympNets.

source

GeometricMachineLearning.Classification — Type

Classification Layer that takes a matrix as an input and returns a vector that is used for MNIST classification.

It has the following arguments:

M: input dimension
N: output dimension
activation: the activation function

And the following optional argument:

average: If this is set to true, then the output is computed as $\frac{1}{N}\sum_{i=1}^N[input]_{\bullet{}i}$. If set to false (the default) it picks the last column of the input.

source

GeometricMachineLearning.ClassificationTransformer — Type

This is a transformer neural network for classification purposes. At the moment this is only used for training on MNIST, but can in theory be used for any classification problem.

It has to be called with a DataLoader that stores an input and an output tensor. The optional arguments are:

n_heads: The number of heads in the MultiHeadAttention (mha) layers. Default: 7.
n_layers: The number of transformer layers. Default: 16.
activation: The activation function. Default: softmax.
Stiefel: Wheter the matrices in the mha layers are on the Stiefel manifold.
add_connection: Whether the input is appended to the output of the mha layer. (skip connection)

source

GeometricMachineLearning.DataLoader — Type

DataLoader(data)

Make an instance based on a data set.

This is designed such to make training convenient.

Constructor

The data loader can be called with various inputs:

A single vector: If the data loader is called with a single vector (and no other arguments are given), then this is interpreted as an autoencoder problem, i.e. the second axis indicates parameter values and/or time steps and the system has a single degree of freedom (i.e. the system dimension is one).
A single matrix: If the data loader is called with a single matrix (and no other arguments are given), then this is interpreted as an autoencoder problem, i.e. the first axis is assumed to indicate the degrees of freedom of the system and the second axis indicates parameter values and/or time steps.
A single tensor: If the data loader is called with a single tensor, then this is interpreted as an integration problem with the second axis indicating the time step and the third one indicating the parameters.
A tensor and a vector: This is a special case (MNIST classification problem). For the MNIST problem for example the input are $n_p$ matrices (first input argument) and $n_p$ integers (second input argument).
A NamedTuple with fields q and p: The NamedTuple contains (i) two matrices or (ii) two tensors.
An EnsembleSolution: The EnsembleSolution typically comes from GeometricProblems.

When we supply a single vector or a single matrix as input to DataLoader and further set autoencoder = false (keyword argument), then the data are stored as an integration problem and the second axis is assumed to indicate time steps.

Fields of DataLoader

The fields of the DataLoader struct are the following:

input: The input data with axes (i) system dimension, (ii) number of time steps and (iii) number of parameters.
output: The tensor that contains the output (supervised learning) - this may be of type Nothing if the constructor is only called with one tensor (unsupervised learning).
input_dim: The dimension of the system, i.e. what is taken as input by a regular neural network.
input_time_steps: The length of the entire time series (length of the second axis).
n_params: The number of parameters that are present in the data set (length of third axis)
output_dim: The dimension of the output tensor (first axis). If output is of type Nothing, then this is also of type Nothing.
output_time_steps: The size of the second axis of the output tensor. If output is of type Nothing, then this is also of type Nothing.

Implementation

Even though DataLoader can be called with inputs of various forms, internally it always stores tensors with three axes.

using GeometricMachineLearning

data = [1 2 3; 4 5 6]
dl = DataLoader(data)
dl.input

# output

[ Info: You have provided a matrix as input. The axes will be interpreted as (i) system dimension and (ii) number of parameters.
2×1×3 Array{Int64, 3}:
[:, :, 1] =
 1
 4

[:, :, 2] =
 2
 5

[:, :, 3] =
 3
 6

source

GeometricMachineLearning.DataLoader — Method

DataLoader(data)

Make an instance based on a data set.

This is designed such to make training convenient.

Constructor

The data loader can be called with various inputs:

A single vector: If the data loader is called with a single vector (and no other arguments are given), then this is interpreted as an autoencoder problem, i.e. the second axis indicates parameter values and/or time steps and the system has a single degree of freedom (i.e. the system dimension is one).
A single matrix: If the data loader is called with a single matrix (and no other arguments are given), then this is interpreted as an autoencoder problem, i.e. the first axis is assumed to indicate the degrees of freedom of the system and the second axis indicates parameter values and/or time steps.
A single tensor: If the data loader is called with a single tensor, then this is interpreted as an integration problem with the second axis indicating the time step and the third one indicating the parameters.
A tensor and a vector: This is a special case (MNIST classification problem). For the MNIST problem for example the input are $n_p$ matrices (first input argument) and $n_p$ integers (second input argument).
A NamedTuple with fields q and p: The NamedTuple contains (i) two matrices or (ii) two tensors.
An EnsembleSolution: The EnsembleSolution typically comes from GeometricProblems.

source

GeometricMachineLearning.DataLoader — Method

DataLoader(data::AbstractMatrix)

Make an instance of DataLoader based on a matrix.

Arguments

This has an addition keyword argument autoencoder, which is set to true by default.

source

GeometricMachineLearning.DataLoader — Method

Constructor for EnsembleSolution from package GeometricSolutions with field q.

source

GeometricMachineLearning.DataLoader — Method

Constructor for EnsembleSolution form package GeometricSolutions with fields q and p.

source

GeometricMachineLearning.Decoder — Type

Abstract Decoder type.

See

Implementation

If a custom <:Decoder architecture is implemented it should have the fields full_dim, reduced_dim and n_decoder_blocks.

source

GeometricMachineLearning.Encoder — Type

Abstract Encoder type.

See

Implementation

If a custom <:Encoder architecture is implemented it should have the fields full_dim, reduced_dim and n_encoder_blocks.

source

GeometricMachineLearning.FeedForwardLoss — Type

FeedForwardLoss()

Make an instance of a loss for feedforward neural networks.

This doesn't have any parameters.

source

GeometricMachineLearning.GSympNet — Type

GSympNet(d)

Make a $G$-SympNet with dimension $d.$

There exists an additional constructor that can be called by supplying an instance of DataLoader.

Arguments

`Keyword arguments are:

upscaling_dimension::Int: The upscaling dimension of the gradient layer. See the documentation for GradientLayerQ and GradientLayerP for further explanation. The default is 2*dim.
n_layers::Int: The number of layers (i.e. the total number of GradientLayerQ and GradientLayerP). The default is 2.
activation: The activation function that is applied. By default this is tanh.
init_upper::Bool: Initialize the gradient layer so that it first modifies the $q$-component. The default is true.

source

GeometricMachineLearning.GlobalSection — Type

GlobalSection(Y::AbstractMatrix)

Construct a global section for Y.

A global section $\lambda$ is a mapping from a homogeneous space $\mathcal{M}$ to the corresponding Lie group $G$ such that

\[\lambda(Y)E = Y,\]

Also see apply_section and global_rep.

Implementation

For an implementation of GlobalSection for a custom array (especially manifolds), the function global_section has to be generalized.

source

GeometricMachineLearning.GradientCache — Type

GradientCache(Y)

Do not store anything.

The cache for the GradientOptimizer does not consider past information.

source

GeometricMachineLearning.GradientLayer — Type

GradientLayer is the struct corresponding to the constructors GradientLayerQ and GradientLayerP. See those for more information.

source

GeometricMachineLearning.GradientLayerP — Type

GradientLayerP(n, upscaling_dimension, activation)

Make an instance of a gradient-$p$ layer.

The gradient layer that changes the $p$ component. It is of the form:

\[\begin{bmatrix} \mathbb{I} & \mathbb{O} \\ \nabla{}V & \mathbb{I} \end{bmatrix},\]

with $V(p) = \sum_{i=1}^Ma_i\Sigma(\sum_jk_{ij}q_j+b_i)$, where $\Sigma$ is the antiderivative of the activation function $\sigma$ (one-layer neural network). We refer to $M$ as the upscaling dimension. Such layers are by construction symplectic.

source

GeometricMachineLearning.GradientLayerQ — Type

GradientLayerQ(n, upscaling_dimension, activation)

Make an instance of a gradient-$q$ layer.

The gradient layer that changes the $q$ component. It is of the form:

\[\begin{bmatrix} \mathbb{I} & \nabla{}V \\ \mathbb{O} & \mathbb{I} \end{bmatrix},\]

with $V(p) = \sum_{i=1}^Ma_i\Sigma(\sum_jk_{ij}p_j+b_i)$, where $\Sigma$ is the antiderivative of the activation function $\sigma$ (one-layer neural network). We refer to $M$ as the upscaling dimension. Such layers are by construction symplectic.

source

GeometricMachineLearning.GradientOptimizer — Type

GradientOptimizer(η)

Make an instance of a gradient optimizer.

This is the simplest neural network optimizer. It has no cache and computes the final velocity as:

\[ \mathrm{velocity} \gets - \eta\nabla_\mathrm{weight}L.\]

Implementation

The operations are done as memory efficiently as possible. This means the provided $\nabla_WL$ is mutated via:

    rmul!(∇L, -method.η)

source

GeometricMachineLearning.GrassmannLayer — Type

Defines a layer that performs simple multiplication with an element of the Grassmann manifold.

source

GeometricMachineLearning.GrassmannLieAlgHorMatrix — Type

GrassmannLieAlgHorMatrix(B::AbstractMatrix{T}, N::Integer, n::Integer) where T

Build an instance of GrassmannLieAlgHorMatrix based on an arbitrary matrix B of size $(N-n)\times{}n$.

GrassmannLieAlgHorMatrix is the horizontal component of the Lie algebra of skew-symmetric matrices (with respect to the canonical metric). The projection here is: $\pi:S \to SE/\sim$ where

\[E = \begin{pmatrix} \mathbb{I}_{n} \\ \mathbb{O}_{(N-n)\times{}n} \end{pmatrix},\]

and the equivalence relation is

\[V_1 \sim V_2 \iff \exists A\in\mathcal{S}_\mathrm{skew}(n) \text{such that } V_2 = V_1 + \begin{pmatrix} A \\ \mathbb{O} \end{pmatrix}\]

An element of GrassmannLieAlgMatrix takes the form:

\[\begin{pmatrix} \bar{\mathbb{O}} & B^T \\ B & \mathbb{O} \end{pmatrix},\]

where $\bar{\mathbb{O}}\in\mathbb{R}^{n\times{}n}$ and $\mathbb{O}\in\mathbb{R}^{(N - n)\times{}n}.$

source

GeometricMachineLearning.GrassmannLieAlgHorMatrix — Method

GrassmannLieAlgHorMatrix(D::AbstractMatrix, n::Integer)

Take a big matrix as input and build an instance of GrassmannLieAlgHorMatrix belonging to the GrassmannManifold $Gr(n, N)$ where $N$ is the number of rows of D.

If the constructor is called with a big $N\times{}N$ matrix, then the projection is performed the following way:

\[\begin{pmatrix} A & B_1 \\ B_2 & D \end{pmatrix} \mapsto \begin{pmatrix} \bar{\mathbb{O}} & -B_2^T \\ B_2 & \mathbb{O} \end{pmatrix}.\]

This can also be seen as the operation:

\[D \mapsto \Omega(E, DE - EE^TDE),\]

where $\Omega$ is the horizontal lift GeometricMachineLearning.Ω.

source

GeometricMachineLearning.GrassmannManifold — Type

The GrassmannManifold is based on the StiefelManifold.

source

GeometricMachineLearning.HRedSys — Type

HRedSys computes the reconstructed dynamics in the full system based on the reduced one. Optionally it can be compared to the FOM solution.

It can be called using the following constructor: HRedSys(N, n; encoder, decoder, v_full, f_full, v_reduced, f_reduced, parameters, tspan, tstep, ics, projection_error) where

encoder: a function $\mathbb{R}^{2N}\mapsto{}\mathbb{R}^{2n}$
decoder: a (differentiable) function $\mathbb{R}^{2n}\mapsto\mathbb{R}^{2N}$
v_full: a (differentiable) mapping defined the same way as in GeometricIntegrators.
f_full: a (differentiable) mapping defined the same way as in GeometricIntegrators.
v_reduced: a (differentiable) mapping defined the same way as in GeometricIntegrators.
f_reduced: a (differentiable) mapping defined the same way as in GeometricIntegrators.
parameters: a NamedTuple that parametrizes the vector fields (the same for fullvectorfield and reducedvectorfield)
tspan: a tuple (t₀, tₗ) that specifies start and end point of the time interval over which integration is performed.
tstep: the time step
ics: the initial condition for the big system.
projection_error: the error $||M - \mathcal{R}\circ\mathcal{P}(M)||$ where $M$ is the snapshot matrix; $\mathcal{P}$ and $\mathcal{R}$ are the reduction and reconstruction respectively.

source

GeometricMachineLearning.LASympNet — Type

LASympNet(d)

Make an $LA$-SympNet with dimension $d.$

There exists an additional constructor that can be called by supplying an instance of DataLoader.

Arguments

Keyword arguments are:

depth::Int: The number of linear layers that are applied. The default is 5.
nhidden::Int: The number of hidden layers (i.e. layers that are not input or output layers). The default is 2.
activation: The activation function that is applied. By default this is tanh.
init_upper_linear::Bool: Initialize the linear layer so that it first modifies the $q$-component. The default is true.
init_upper_act::Bool: Initialize the activation layer so that it first modifies the $q$-component. The default is true.

source

GeometricMachineLearning.LayerWithManifold — Type

LayerWithManifold is a subtype of AbstractExplicitLayer that contains manifolds as weights.

source

GeometricMachineLearning.LayerWithOptionalManifold — Type

LayerWithOptionalManifold is a subtype of AbstractExplicitLayer that can contain manifolds as weights.

source

GeometricMachineLearning.LinearLayer — Type

LinearLayer is the struct corresponding to the constructors LinearLayerQ and LinearLayerP. See those for more information.

Implementation

LinearLayer uses the custom matrix SymmetricMatrix for an especially efficient implementation.

source

GeometricMachineLearning.LinearLayerP — Type

LinearLayerP(n)

Make a linear layer of dimension $n\times{}n$ that only changes the $p$ component.

Equivalent to a left multiplication by the matrix:

\[\begin{pmatrix} \mathbb{I} & \mathbb{O} \\ B & \mathbb{I} \end{pmatrix}, \]

where $B$ is a symmetric matrix.

source

GeometricMachineLearning.LinearLayerQ — Type

LinearLayerQ(n)

Make a linear layer of dimension $n\times{}n$ that only changes the $q$ component.

Equivalent to a left multiplication by the matrix:

\[\begin{pmatrix} \mathbb{I} & B \\ \mathbb{O} & \mathbb{I} \end{pmatrix}, \]

where $B$ is a symmetric matrix.

source

GeometricMachineLearning.LinearSymplecticAttention — Type

Implements the linear symplectic attention layers. Analogous to GradientLayer it performs mappings that only change the $Q$ or the $P$ part. For more information see LinearSymplecticAttentionQ and LinearSymplecticAttentionP.

Constructor

For the constructors simply call

LinearSymplecticAttentionQ(sys_dim, seq_length)

LinearSymplecticAttentionP(sys_dim, seq_length)

where sys_dim is the system dimension and seq_length is the sequence length.

source

GeometricMachineLearning.LinearSymplecticAttentionP — Type

Performs:

\[\begin{pmatrix} Q \\ P \end{pmatrix} \mapsto \begin{pmatrix} Q + \nabla_PF \\ P \end{pmatrix},\]

where $Q,\, P\in\mathbb{R}^{n\times{}T}$ and $F(P) = \frac{1}{2}\mathrm{Tr}(P A P^T)$.

source

GeometricMachineLearning.LinearSymplecticAttentionQ — Type

Performs:

\[\begin{pmatrix} Q \\ P \end{pmatrix} \mapsto \begin{pmatrix} Q + \nabla_PF \\ P \end{pmatrix},\]

where $Q,\, P\in\mathbb{R}^{n\times{}T}$ and $F(P) = \frac{1}{2}\mathrm{Tr}(P A P^T)$.

source

GeometricMachineLearning.LinearSymplecticTransformer — Type

Realizes the linear Symplectic Transformer.

Constructor:

The constructor is called with the following arguments

dim::Int: System dimension
seq_length::Int: Number of time steps that the transformer considers.

Optional keyword arguments:

n_sympnet::Int=2: The number of sympnet layers in the transformer.
upscaling_dimension::Int=2*dim: The upscaling that is done by the gradient layer.
L::Int=1: The number of transformer units.
activation=tanh: The activation function for the SympNet layers.
init_upper::Bool=true: Specifies if the first layer is a $Q$-type layer (init_upper=true) or if it is a $P$-type layer (init_upper=false).

source

GeometricMachineLearning.LowerTriangular — Type

LowerTriangular(S::AbstractVector, n::Int)

Build a lower-triangular matrix from a vector.

A lower-triangular matrix is an $n\times{}n$ matrix that has ones on the diagonal and zeros on the upper triangular.

The data are stored in a vector $S$ similarly to other matrices. See UpperTriangular, SkewSymMatrix and SymmetricMatrix.

The struct two fields: S and n. The first stores all the entries of the matrix in a sparse fashion (in a vector) and the second is the dimension $n$ for $A\in\mathbb{R}^{n\times{}n}$.

Examples

using GeometricMachineLearning
S = [1, 2, 3, 4, 5, 6]
LowerTriangular(S, 4)

# output

4×4 LowerTriangular{Int64, Vector{Int64}}:
 0  0  0  0
 1  0  0  0
 2  3  0  0
 4  5  6  0

source

GeometricMachineLearning.LowerTriangular — Method

LowerTriangular(A::AbstractMatrix)

Build a lower-triangular matrix from a matrix.

This is done by taking the lower left of that matrix.

Examples

using GeometricMachineLearning
M = [1 2 3 4; 5 6 7 8; 9 10 11 12; 13 14 15 16]
LowerTriangular(M)

# output

4×4 LowerTriangular{Int64, Vector{Int64}}:
  0   0   0  0
  5   0   0  0
  9  10   0  0
 13  14  15  0

source

GeometricMachineLearning.Manifold — Type

A manifold in GeometricMachineLearning is a sutype of AbstractMatrix. All manifolds are matrix manifolds and therefore stored as matrices. More details can be found in the docstrings for the StiefelManifold and the GrassmannManifold.

source

GeometricMachineLearning.ManifoldLayer — Type

This defines a manifold layer that only has one matrix-valued manifold $A$ associated with it does $x\mapsto{}Ax$.

source

GeometricMachineLearning.MomentumCache — Type

MomentumCache(Y)

Store the moment for Y (initialized as zeros).

The moment is called B.

If the cache is called with an instance of a homogeneous space, e.g. the StiefelManifold $St(n,N)$ it initializes the moments as elements of $\mathfrak{g}^\mathrm{hor}$ (StiefelLieAlgHorMatrix).

See AdamCache.

source

GeometricMachineLearning.MomentumOptimizer — Type

MomentumOptimizer(η, α)

Make an instance of the momentum optimizer.

The momentum optimizer is similar to the GradientOptimizer. It however has a nontrivial cache that stores past history (see MomentumCache). The cache is updated via:

\[ B^{\mathrm{cache}} \gets \alpha{}B^{\mathrm{cache}} + \nabla_\mathrm{weights}L\]

and then the final velocity is computed as

\[ \mathrm{velocity} \gets - \eta{}B^{\mathrm{cache}}.\]

Or the riemannian manifold equivalent, if applicable.

Implementation

To save memory the velocity is stored in the input $\nabla_WL$. This is similar to the case of the GradientOptimizer.

source

GeometricMachineLearning.MultiHeadAttention — Type

MultiHeadAttention (MHA) serves as a preprocessing step in the transformer. It reweights the input vectors bases on correlations within those data.

Constructor

Takes input arguments:

dim::Int: The system dimension
n_heads::Int: The number of heads.
Stiefel::Bool=true (keyword argument): whether the weights should be put on the Stiefel manifold.
retraction::AbstractRetraction (keyword argument): what kind of retraction should be used. By default this is the geodesic retraction.
add_connection::Bool=true (keyword argument): determines if the input should be added to the output for the final result.

source

GeometricMachineLearning.NetworkLoss — Type

An abstract type for all the neural network losses. If you want to implement CustomLoss <: NetworkLoss you need to define a functor:

    (loss::CustomLoss)(model, ps, input, output)

where model is an instance of an AbstractExplicitLayer or a Chain and ps the parameters.

source

GeometricMachineLearning.NeuralNetworkIntegrator — Type

This is a super type of various neural network architectures such as SympNet and ResNet whose purpose is to approximate the flow of an ordinary differential equation (ODE).

source

GeometricMachineLearning.NonLinearSymplecticDecoder — Type

The NonLinearSymplecticDecoder.

This is typically generated by calling decoder on an instance of SymplecticAutoencoder.

source

GeometricMachineLearning.NonLinearSymplecticEncoder — Type

The NonLinearSymplecticEncoder.

This is typically generated by calling encoder on an instance of SymplecticAutoencoder.

source

GeometricMachineLearning.Optimizer — Type

Optimizer(method, cache, step, retraction)

Store the method (e.g. AdamOptimizer with corresponding hyperparameters), the cache (e.g. AdamCache), the optimization step and the retraction.

It takes as input an optimization method and the parameters of a network.

For technical reasons we first specify an OptimizerMethod that stores all the hyperparameters of the optimizer.

source

GeometricMachineLearning.Optimizer — Method

A functor for Optimizer. It is called with: - nn::NeuralNetwork - dl::DataLoader - batch::Batch - n_epochs::Int - loss

The last argument is a function through which Zygote differentiates. This argument is optional; if it is not supplied GeometricMachineLearning defaults to an appropriate loss for the DataLoader.

source

GeometricMachineLearning.Optimizer — Method

Optimizer(method, nn::NeuralNetwork)

Allocate the cache for a specific method and a NeuralNetwork for an instance of Optimizer.

Internally this calls Optimizer(method, nn.params).

Typically the Optimizer is not initialized with the network parameters, but instead with a NeuralNetwork struct.

source

GeometricMachineLearning.Optimizer — Method

Optimizer(method, nn_params)

Allocate the cache for a specific method and nn_params for an instance of Optimizer.

Internally this calls init_optimizer_cache.

Arguments

The optional keyword argument is the retraction. By default this is cayley.

source

GeometricMachineLearning.OptimizerMethod — Type

Each Optimizer has to be called with an OptimizerMethod. This specifies how the neural network weights are updated in each optimization step.

source

GeometricMachineLearning.PSDArch — Type

The architecture

Proper symplectic decomposition (PSD) can be seen as a SymplecticAutoencoder for which the decoder and the encoder are both PSD-like matrices (see the docs for PSDLayer.

Training

For optimizing the parameters in this architecture no neural network training is necessary (see the docs for solve!).

The constructor

The constructor only takes two arguments as input:

full_dim::Integer
reduced_dim::Integer

source

GeometricMachineLearning.PSDLayer — Type

This is a PSD-like layer used for symplectic autoencoders. One layer has the following shape:

\[A = \begin{bmatrix} \Phi & \mathbb{O} \\ \mathbb{O} & \Phi \end{bmatrix},\]

where $\Phi$ is an element of the Stiefel manifold $St(n, N)$.

The constructor of PSDLayer is called by PSDLayer(M, N; retraction=retraction):

M is the input dimension.
N is the output dimension.
retraction is an instance of a struct with supertype AbstractRetraction. The only options at the moment are Geodesic() and Cayley().

source

GeometricMachineLearning.QPT — Type

The type for data in $(q, p)$ coordinates.

source

GeometricMachineLearning.ReducedLoss — Type

ReducedLoss(encoder, decoder)

Make an instance of ReducedLoss based on an Encoder and a Decoder.

This loss should be used together with a NeuralNetworkIntegrator or TransformerIntegrator.

The loss is computed as:

\[\mathrm{loss}_{\mathcal{E}, \mathcal{D}}(\mathcal{NN}, \mathrm{input}, \mathrm{output}) = ||\mathcal{D}(\mathcal{NN}(\mathcal{E}(\mathrm{input}))) - \mathrm{output}||,\]

where $\mathcal{E}$ is the Encoder, $\mathcal{D}$ is the Decoder. $\mathcal{NN}$ is the neural network we compute the loss of.

source

GeometricMachineLearning.ReducedLoss — Method

ReducedLoss(autoencoder)

Make an instance of ReducedLoss based on a neural network of type AutoEncoder.

source

GeometricMachineLearning.ResNet — Type

A ResNet is a neural network that realizes a mapping of the form: $x = \mathcal{NN}(x) + x$, so the input is again added to the output (a so-called add connection). In GeometricMachineLearning the specific ResNet that we use consists of a series of simple ResNetLayers.

source

GeometricMachineLearning.ResNetLayer — Type

The ResNetLayer is a simple feedforward neural network to which we add the input after applying it, i.e. it realizes $x \mapsto x + \sigma(Ax + b)$.

source

GeometricMachineLearning.SkewSymMatrix — Type

SkewSymMatrix(S::AbstractVector, n::Integer)

Instantiate a skew-symmetric matrix with information stored in vector S.

A skew-symmetric matrix $A$ is a matrix $A^T = -A$.

Internally the struct saves a vector $S$ of size $n(n-1)\div2$. The conversion is done the following way:

\[[A]_{ij} = \begin{cases} 0 & \text{if $i=j$} \\ S[( (i-2) (i-1) ) \div 2 + j] & \text{if $i>j$}\\ S[( (j-2) (j-1) ) \div 2 + i] & \text{else}. \end{cases}\]

Also see SymmetricMatrix, LowerTriangular and UpperTriangular.

Examples

using GeometricMachineLearning
S = [1, 2, 3, 4, 5, 6]
SkewSymMatrix(S, 4)

# output

4×4 SkewSymMatrix{Int64, Vector{Int64}}:
 0  -1  -2  -4
 1   0  -3  -5
 2   3   0  -6
 4   5   6   0

source

GeometricMachineLearning.SkewSymMatrix — Method

SkewSymMatrix(A::AbstractMatrix)

Perform 0.5 * (A - A') and store the matrix in an efficient way (as a vector with $n(n-1)/2$ entries).

If the constructor is called with a matrix as input it returns a skew-symmetric matrix via the projection:

\[A \mapsto \frac{1}{2}(A - A^T).\]

Examples

using GeometricMachineLearning
M = [1 2 3 4; 5 6 7 8; 9 10 11 12; 13 14 15 16]
SkewSymMatrix(M)

# output

4×4 SkewSymMatrix{Float64, Vector{Float64}}:
 0.0  -1.5  -3.0  -4.5
 1.5   0.0  -1.5  -3.0
 3.0   1.5   0.0  -1.5
 4.5   3.0   1.5   0.0

Extend help

Note that the constructor is designed in such a way that it always returns matrices of type SkewSymMatrix{<:AbstractFloat} when called with a matrix, even if this matrix is of type AbstractMatrix{<:Integer}.

If the user wishes to allocate a matrix SkewSymMatrix{<:Integer} the constructor SkewSymMatrix(::AbstractVector, n::Integer) has to be called.

source

GeometricMachineLearning.StandardTransformerIntegrator — Type

The regular transformer used as an integrator (multi-step method).

The constructor is called with one argument:

sys_dim::Int

The following are keyword arguments:

transformer_dim::Int: the default is transformer_dim = sys_dim.
n_blocks::Int: The default is 1.
n_heads::Int: the number of heads in the multihead attentio layer (default is n_heads = sys_dim)
L::Int the number of transformer blocks (default is L = 2).
upscaling_activation: by default identity
resnet_activation: by default tanh
add_connection:Bool=true: if the input should be added to the output.

source

GeometricMachineLearning.StiefelLayer — Type

Defines a layer that performs simple multiplication with an element of the Stiefel manifold.

source

GeometricMachineLearning.StiefelLieAlgHorMatrix — Type

StiefelLieAlgHorMatrix(A::SkewSymMatrix{T}, B::AbstractMatrix{T}, N::Integer, n::Integer) where T

Build an instance of StiefelLieAlgHorMatrix based on a skew-symmetric matrix A and an arbitrary matrix B.

StiefelLieAlgHorMatrix is the horizontal component of the Lie algebra of skew-symmetric matrices (with respect to the canonical metric). The projection here is: $\pi:S \to SE$ where

\[E = \begin{pmatrix} \mathbb{I}_{n} \\ \mathbb{O}_{(N-n)\times{}n} \end{pmatrix}.\]

The matrix $E$ is implemented under StiefelProjection in GeometricMachineLearning.

An element of StiefelLieAlgMatrix takes the form:

\[\begin{pmatrix} A & B^T \\ B & \mathbb{O} \end{pmatrix},\]

where $A$ is skew-symmetric (this is SkewSymMatrix in GeometricMachineLearning).

Also see GrassmannLieAlgHorMatrix.

source

GeometricMachineLearning.StiefelLieAlgHorMatrix — Method

StiefelLieAlgHorMatrix(D::AbstractMatrix, n::Integer)

Take a big matrix as input and build an instance of StiefelLieAlgHorMatrix belonging to the StiefelManifold $St(n, N)$ where $N$ is the number of rows of D.

If the constructor is called with a big $N\times{}N$ matrix, then the projection is performed the following way:

\[\begin{pmatrix} A & B_1 \\ B_2 & D \end{pmatrix} \mapsto \begin{pmatrix} \mathrm{skew}(A) & -B_2^T \\ B_2 & \mathbb{O} \end{pmatrix}.\]

The operation $\mathrm{skew}:\mathbb{R}^{n\times{}n}\to\mathcal{S}_\mathrm{skew}(n)$ is the skew-symmetrization operation. This is equivalent to calling of SkewSymMatrix with an $n\times{}n$ matrix.

This can also be seen as the operation:

\[D \mapsto \Omega(E, DE) = \mathrm{skew}\left(2 \left(\mathbb{I} - \frac{1}{2} E E^T \right) DE E^T\right).\]

Also see GeometricMachineLearning.Ω.

source

GeometricMachineLearning.StiefelManifold — Type

An implementation of the Stiefel manifold [7]. The Stiefel manifold is the collection of all matrices $Y\in\mathbb{R}^{N\times{}n}$ whose columns are orthonormal, i.e.

\[ St(n, N) = \{Y: Y^TY = \mathbb{I}_n \}.\]

The Stiefel manifold can be shown to have manifold structure (as the name suggests) and this is heavily used in GeometricMachineLearning. It is further a compact space. More information can be found in the docstrings for rgrad(::StiefelManifold, ::AbstractMatrix)andmetric(::StiefelManifold, ::AbstractMatrix, ::AbstractMatrix)`.

source

GeometricMachineLearning.StiefelProjection — Type

Outer constructor for StiefelProjection. This works with two integers as input and optionally the type.

source

GeometricMachineLearning.StiefelProjection — Type

StiefelProjection(backend, T, N, n)

Make a matrix of the form $\begin{bmatrix} \mathbb{I} & \mathbb{O} \end{bmatrix}^T$ for a specific backend and data type.

An array that essentially does vcat(I(n), zeros(N-n, n)) with GPU support.

Extend help

Technically this should be a subtype of StiefelManifold.

source

GeometricMachineLearning.StiefelProjection — Method

StiefelProjection(A::AbstractMatrix)

Extract necessary information from A and build an instance of StiefelProjection.

Necessary information here referes to the backend, the data type and the size of the matrix.

source

GeometricMachineLearning.StiefelProjection — Method

StiefelProjection(B::AbstractLieAlgHorMatrix)

Extract necessary information from B and build an instance of StiefelProjection.

Necessary information here referes to the backend, the data type and the size of the matrix.

The size is queried through B.N and B.n.

Examples

using GeometricMachineLearning

B₁ = rand(StiefelLieAlgHorMatrix, 5, 2)
B₂ = rand(GrassmannLieAlgHorMatrix, 5, 2)
E = [1. 0.; 0. 1.; 0. 0.; 0. 0.; 0. 0.]

StiefelProjection(B₁) ≈ StiefelProjection(B₂) ≈ E 

# output

true

source

GeometricMachineLearning.SymmetricMatrix — Type

SymmetricMatrix(S::AbstractVector, n::Integer)

Instantiate a symmetric matrix with information stored in vector S.

A SymmetricMatrix $A$ is a matrix $A^T = A$.

Internally the struct saves a vector $S$ of size $n(n+1)\div2$. The conversion is done the following way:

\[[A]_{ij} = \begin{cases} S[( (i-1) i ) \div 2 + j] & \text{if $i\geq{}j$}\\ S[( (j-1) j ) \div 2 + i] & \text{else}. \end{cases}\]

So $S$ stores a string of vectors taken from $A$: $S = [\tilde{a}_1, \tilde{a}_2, \ldots, \tilde{a}_n]$ with $\tilde{a}_i = [[A]_{i1},[A]_{i2},\ldots,[A]_{ii}]$.

Also see SkewSymMatrix, LowerTriangular and UpperTriangular.

Examples

using GeometricMachineLearning
S = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
SymmetricMatrix(S, 4)

# output

4×4 SymmetricMatrix{Int64, Vector{Int64}}:
 1  2  4   7
 2  3  5   8
 4  5  6   9
 7  8  9  10

source

GeometricMachineLearning.SymmetricMatrix — Method

SymmetricMatrix(A::AbstractMatrix)

Perform 0.5 * (A + A') and store the matrix in an efficient way (as a vector with $n(n+1)/2$ entries).

If the constructor is called with a matrix as input it returns a symmetric matrix via the projection:

\[A \mapsto \frac{1}{2}(A + A^T).\]

Examples

using GeometricMachineLearning
M = [1 2 3 4; 5 6 7 8; 9 10 11 12; 13 14 15 16]
SymmetricMatrix(M)

# output

4×4 SymmetricMatrix{Float64, Vector{Float64}}:
 1.0   3.5   6.0   8.5
 3.5   6.0   8.5  11.0
 6.0   8.5  11.0  13.5
 8.5  11.0  13.5  16.0

Extend help

Note that the constructor is designed in such a way that it always returns matrices of type SymmetricMatrix{<:AbstractFloat} when called with a matrix, even if this matrix is of type AbstractMatrix{<:Integer}.

If the user wishes to allocate a matrix SymmetricMatrix{<:Integer} the constructor SymmetricMatrix(::AbstractVector, n::Integer) has to be called.

source

GeometricMachineLearning.SympNet — Type

The SympNet type encompasses GSympNets and LASympNets. SympNets are universal approximators of symplectic flows, i.e. maps $\varphi:\mathbb{R}^{2n}\to\mathbb{R}^{2n}$ for which $(\nabla\varphi)^T\mathbb{J}\nabla\varphi = \mathbb{J}$ holds.

source

GeometricMachineLearning.SympNetLayer — Type

Implements the various layers from the SympNet paper [24]. This is a super type of GradientLayer, ActivationLayer and LinearLayer.

For the linear layer, the activation and the bias are left out, and for the activation layer $K$ and $b$ are left out!

source

GeometricMachineLearning.SympNetLayer — Method

This is called when a SympnetLayer is applied to a NamedTuple. It calls apply_layer_to_nt_and_return_array.

source

GeometricMachineLearning.SymplecticAutoencoder — Type

SymplecticAutoencoder(full_dim, reduced_dim)

Make an instance of SymplecticAutoencoder for dimensions full_dim and reduced_dim (those have to be provided as Integers).

The architecture

The symplectic autoencoder architecture was introduced in [40]. Like any other autoencoder it consists of an encoder $\Psi^e:\mathbb{R}^{2N}\to\mathbb{R}^{2n}$ and a decoder $\Psi^d:\mathbb{R}^{2n}\to\mathbb{R}^{2N}$ with $n\ll{}N$. These satisfy the following properties:

\[\nabla_z\Psi^e\mathbb{J}_{2N}(\nabla_z\Psi^e\mathbb{J}_{2N})^T = \mathbb{J}_{2n} \text{ and } (\nabla_\xi\Psi^d)^T\mathbb{J}_{2N}\nabla_\xi\Psi^d = \mathbb{J}_{2n}.\]

Because the decoder has this particular property, the reduced system can be described by the Hamiltonian $H\circ\Psi^d$:

\[\mathbb{J}_{2n}\nabla_\xi(H\circ\Psi^d) = \mathbb{J}_{2n}(\nabla_\xi\Psi^d)^T\nabla_{\Psi^d(\xi)}H = \mathbb{J}_{2n}(\nabla_\xi\Psi^d)^T\mathbb{J}_{2N}^T\mathbb{J}_{2N}\nabla_{\Psi^d(\xi)}H = (\nabla_\xi\Psi^d)^+X_H(\Psi^d(\xi)),\]

where $(\nabla_\xi\Psi^d)^+$ is the pseudoinverse of $\nabla_\xi\Psi^d$ (for more details see the docs on the AutoEncoder type).

Arguments

You can provide the following keyword arguments:

n_encoder_layers::Integer = 4: The number of layers in one encoder block.
n_encoder_blocks::Integer = 2: The number of encoder blocks.
n_decoder_layers::Integer = 1: The number of layers in one decoder block.
n_decoder_blocks::Integer = 3: The number of decoder blocks.
sympnet_upscale::Integer = 5: The upscaling dimension of the GSympNet. See GradientLayerQ and GradientLayerP.
activation = tanh: The activation in the gradient layers.
encoder_init_q::Bool = true: Specifies if the first layer in each encoder block should be of $q$ type.
decoder_init_q::Bool = true: Specifies if the first layer in each decoder block should be of $p$ type.

source

GeometricMachineLearning.SymplecticDecoder — Type

Abstract SymplecticDecoder type.

See Decoder for the super type and NonLinearSymplecticDecoder for a derived struct.

source

GeometricMachineLearning.SymplecticEncoder — Type

Abstract SymplecticEncoder type.

See Encoder for the super type and NonLinearSymplecticEncoder for a derived struct.

source

GeometricMachineLearning.SymplecticPotential — Type

SymplecticPotential(n)

Returns a symplectic matrix of size 2n x 2n

\[\begin{pmatrix} \mathbb{O} & \mathbb{I} \\ \mathbb{O} & -\mathbb{I} \\ \end{pmatrix}\]

source

GeometricMachineLearning.TrainingData — Type

TrainingData stores:

 - problem 

 - shape 

 - get 

 - symbols 

 - dim 

 - noisemaker

source

GeometricMachineLearning.TransformerIntegrator — Type

Encompasses various transformer architectures, such as the VolumePreservingTransformer and the LinearSymplecticTransformer.

source

GeometricMachineLearning.TransformerLoss — Type

TransformerLoss(seq_length, prediction_window)

Make an instance of the transformer loss.

This is the loss for a transformer network (especially a transformer integrator).

Parameters

The prediction_window specifies how many time steps are predicted into the future. It defaults to the value specified for seq_length.

source

GeometricMachineLearning.UpperTriangular — Type

LowerTriangular(S::AbstractVector, n::Int)

Build a lower-triangular matrix from a vector.

A lower-triangular matrix is an $n\times{}n$ matrix that has ones on the diagonal and zeros on the upper triangular.

The data are stored in a vector $S$ similarly to other matrices. See LowerTriangular, SkewSymMatrix and SymmetricMatrix.

The struct two fields: S and n. The first stores all the entries of the matrix in a sparse fashion (in a vector) and the second is the dimension $n$ for $A\in\mathbb{R}^{n\times{}n}$.

Examples

using GeometricMachineLearning
S = [1, 2, 3, 4, 5, 6]
UpperTriangular(S, 4)

# output

4×4 UpperTriangular{Int64, Vector{Int64}}:
 0  1  2  4
 0  0  3  5
 0  0  0  6
 0  0  0  0

source

GeometricMachineLearning.UpperTriangular — Method

UpperTriangular(A::AbstractMatrix)

Build a lower-triangular matrix from a matrix.

This is done by taking the lower left of that matrix.

Examples

using GeometricMachineLearning
M = [1 2 3 4; 5 6 7 8; 9 10 11 12; 13 14 15 16]
UpperTriangular(M)

# output

4×4 UpperTriangular{Int64, Vector{Int64}}:
 0  2  3   4
 0  0  7   8
 0  0  0  12
 0  0  0   0

source

GeometricMachineLearning.VolumePreservingAttention — Type

Volume-preserving attention (single head attention)

Drawbacks:

the super fast activation is only implemented for sequence lengths of 2, 3, 4 and 5.
other sequence lengths only work on CPU for now (lu decomposition has to be implemented to work for tensors in parallel).

Constructor

The constructor is called with:

dim::Int: The system dimension
seq_length::Int: The sequence length to be considered. The default is zero, i.e. arbitrary sequence lengths; this works for all sequence lengths but doesn't apply the super-fast activation.
skew_sym::Bool (keyword argument): specifies if we the weight matrix is skew symmetric or arbitrary (default is false).

Functor

Applying a layer of type VolumePreservingAttention does the following:

First we perform the operation $X \mapsto X^T A X =: C$, where $X\in\mathbb{R}^{N\times\mathtt{seq\_length}}$ is a vector containing time series data and $A$ is the skew symmetric matrix associated with the layer.
In a second step we compute the Cayley transform of $C$; $\Lambda = \mathrm{Cayley}(C)$.
The output of the layer is then $X\Lambda$.

source

GeometricMachineLearning.VolumePreservingFeedForward — Type

Realizes a volume-preserving neural network as a combination of VolumePreservingLowerLayer and VolumePreservingUpperLayer.

Constructor

The constructor is called with the following arguments:

sys_dim::Int: The system dimension.
n_blocks::Int: The number of blocks in the neural network (containing linear layers and nonlinear layers). Default is 1.
n_linear::Int: The number of linear VolumePreservingLowerLayers and VolumePreservingUpperLayers in one block. Default is 1.
activation: The activation function for the nonlinear layers in a block.
init_upper::Bool=false (keyword argument): Specifies if the first layer is lower or upper.

source

GeometricMachineLearning.VolumePreservingFeedForwardLayer — Type

Super-type of VolumePreservingLowerLayer and VolumePreservingUpperLayer. The layers do the following:

\[x \mapsto \begin{cases} \sigma(Lx + b) & \text{where $L$ is }\mathtt{LowerTriangular} \\ \sigma(Ux + b) & \text{where $U$ is }\mathtt{UpperTriangular}. \end{cases}\]

The functor can be applied to a vecotr, a matrix or a tensor.

Constructor

The constructors are called with:

sys_dim::Int: the system dimension.
activation=tanh: the activation function.
include_bias::Bool=true (keyword argument): specifies whether a bias should be used.

source

GeometricMachineLearning.VolumePreservingLowerLayer — Type

See the documentation for VolumePreservingFeedForwardLayer.

source

GeometricMachineLearning.VolumePreservingTransformer — Type

The volume-preserving transformer with the Cayley activation function and built-in upscaling.

Constructor

The arguments for the constructor are:

sys_dim::Int
seq_length::Int: The sequence length of the data fed into the transformer.

The following are keyword argumetns:

n_blocks::Int=1: The number of blocks in one transformer unit (containing linear layers and nonlinear layers). Default is 1.
n_linear::Int=1: The number of linear VolumePreservingLowerLayers and VolumePreservingUpperLayers in one block. Default is 1.
L::Int=1: The number of transformer units.
activation=tanh: The activation function.
init_upper::Bool=false: Specifies if the network first acts on the $q$ component.
skew_sym::Bool=false: specifies if we the weight matrix is skew symmetric or arbitrary.

source

GeometricMachineLearning.VolumePreservingUpperLayer — Type

See the documentation for VolumePreservingFeedForwardLayer.

source

AbstractNeuralNetworks.update! — Method

update!(o, cache, dx::AbstractArray)

Update the cache based on the gradient information dx, compute the final velocity and store it in dx.

The optimizer o is needed because some updating schemes (such as AdamOptimizer) also need information on the current time step.

source

AbstractNeuralNetworks.update! — Method

update!(o, cache, B)

First update the cache and then update the array B based on the optimizer o.

Note that $B\in\mathfrak{g}^\mathrm{hor}$ in general.

source

AbstractNeuralNetworks.update! — Method

update!(o::Optimizer{<:BFGSOptimizer}, C, B)

Peform an update with the BFGS optimizer.

C is the cache, B contains the gradient information (the output of global_rep in general).

First we compute the final velocity with

    vecS = -o.method.η * C.H * vec(B)

and then we update H

    C.H .= (𝕀 - ρ * SY) * C.H * (𝕀 - ρ * SY') + ρ * vecS * vecS'

where SY is vecS * Y' and 𝕀 is the idendity.

Implementation

For stability we use δ for computing ρ:

    ρ = 1. / (vecS' * Y + o.method.δ)

This is similar to the AdamOptimizer

Extend Help

If we have weights on a Manifold than the updates are slightly more difficult. In this case the vec operation has to be generalized to the corresponding global tangent space.

source

Base.:* — Method

*(λY, Y)

Apply the element λY onto Y.

Here λY is an element of a Lie group and Y is an element of a homogeneous space.

source

Base.iterate — Method

This function computes a trajectory for a Transformer that has already been trained for valuation purposes.

It takes as input:

nn: a NeuralNetwork (that has been trained).
ics: initial conditions (a matrix in $\mathbb{R}^{2n\times\mathtt{seq\_length}}$ or NamedTuple of two matrices in $\mathbb{R}^{n\times\mathtt{seq\_length}}$)
n_points::Int=100 (keyword argument): The number of steps for which we run the prediction.
prediction_window::Int=size(ics.q, 2): The prediction window (i.e. the number of steps we predict into the future) is equal to the sequence length (i.e. the number of input time steps) by default.

source

Base.iterate — Method

This function computes a trajectory for a SympNet that has already been trained for valuation purposes.

It takes as input:

nn: a NeuralNetwork (that has been trained).
ics: initial conditions (a NamedTuple of two vectors)

source

Base.rand — Method

rand(backend::KernelAbstractions.Backend, manifold_type::Type{MT}, N::Integer, n::Integer) where MT <: Manifold)

Draw random elements for a specific device.

Examples

using GeometricMachineLearning
using GeometricMachineLearning: _round # hide
import Random
Random.seed!(123)

N, n = 5, 3
Y = rand(CPU(), StiefelManifold{Float32}, N, n)
_round(Y; digits = 5) # hide

# output

5×3 StiefelManifold{Float32, Matrix{Float32}}:
 -0.27575   0.32991   0.77275
 -0.62485  -0.33224  -0.0686
 -0.69333   0.36724  -0.18988
 -0.09295  -0.73145   0.46064
  0.2102    0.33301   0.38717

Random elements of the manifold can also be allocated on GPU, via e.g. ...

rand(CUDABackend(), StiefelManifold{Float32}, N, n)

... for drawing elements on a CUDA device.

source

Base.rand — Method

rand(manifold_type::Type{MT}, N::Integer, n::Integer) where MT <: Manifold

Draw random elements from the Stiefel and the Grassmann manifold.

Because both of these manifolds are compact spaces we can sample them uniformly [8].

Examples

When we call ...

using GeometricMachineLearning
using GeometricMachineLearning: _round # hide
import Random
Random.seed!(123)

N, n = 5, 3
Y = rand(StiefelManifold{Float32}, N, n)
_round(Y; digits = 5) # hide

# output

5×3 StiefelManifold{Float32, Matrix{Float32}}:
 -0.27575   0.32991   0.77275
 -0.62485  -0.33224  -0.0686
 -0.69333   0.36724  -0.18988
 -0.09295  -0.73145   0.46064
  0.2102    0.33301   0.38717

... the sampling is done by first allocating a random matrix of size $N\times{}n$ via Y = randn(Float32, N, n). We then perform a QR decomposition Q, R = qr(Y) with the qr function from the LinearAlgebra package (this is using Householder reflections internally). The final output are then the first n columns of the Q matrix.

source

Base.vec — Method

If vec is applied onto Triangular, then the output is the associated vector.

source

Base.vec — Method

If vec is applied onto SkewSymMatrix, then the output is the associated vector.

source

ChainRulesCore.rrule — Method

This implements the custom pullback for tensortransposemat_mul

source

GeometricMachineLearning.Gradient — Function

This is an old constructor and will be depricated. For change_q=true it is equivalent to GradientLayerQ; for change_q=false it is equivalent to GradientLayerP.

If full_grad=false then ActivationLayer is called

source

GeometricMachineLearning.Transformer — Method

The architecture for a "transformer encoder" is essentially taken from arXiv:2010.11929, but with the difference that no layer normalization is employed. This is because we still need to find a generalization of layer normalization to manifolds.

The transformer is called with the following inputs:

dim: the dimension of the transformer
n_heads: the number of heads
L: the number of transformer blocks

In addition we have the following optional arguments:

activation: the activation function used for the ResNet (tanh by default)
Stiefel::Bool: if the matrices $P^V$, $P^Q$ and $P^K$ should live on a manifold (false by default)
retraction: which retraction should be used (Geodesic() by default)
add_connection::Bool: if the input should by added to the ouput after the MultiHeadAttention layer is used (true by default)
use_bias::Bool: If the ResNet should use a bias (true by default)

source

GeometricMachineLearning.accuracy — Method

accuracy(nn, dl)

Compute the accuracy of a neural network classifier.

source

GeometricMachineLearning.accuracy — Method

accuracy(model, ps, dl)

Compute the accuracy of a neural network classifier.

source

GeometricMachineLearning.apply_layer_to_nt_and_return_array — Method

This function is used in the wrappers where the input to the SympNet layers is not a NamedTuple (as it should be) but an AbstractArray.

It converts the Array to a NamedTuple (via assign_q_and_p), then calls the SympNet routine(s) and converts back to an AbstractArray (with vcat).

source

GeometricMachineLearning.apply_section! — Method

apply_section!(Y::AT, λY::GlobalSection{T, AT}, Y₂::AT) where {T, AT<:StiefelManifold{T}}

Apply λY to Y₂ and store the result in Y.

The inplace version of apply_section.

source

GeometricMachineLearning.apply_section — Method

apply_section(λY::GlobalSection{T, AT}, Y₂::AT) where {T, AT <: StiefelManifold{T}}

Apply λY to Y₂.

Mathematically this is the group action of the element $\lambda{}Y\in{}G$ on the element $Y_2$ of the homogeneous space $\mathcal{M}$.

Internally it calls the inplace version apply_section!.

source

GeometricMachineLearning.assign_batch_kernel! — Method

Takes as input a batch tensor (to which the data are assigned), the whole data tensor and two vectors params and time_steps that include the specific parameters and time steps we want to assign.

Note that this assigns sequential data! For e.g. being processed by a transformer.

source

GeometricMachineLearning.assign_output_estimate — Method

The function assign_output_estimate is closely related to the transformer. It takes the last prediction_window columns of the output and uses them for the final prediction. i.e.

\[\mathbb{R}^{N\times\mathtt{pw}}\to\mathbb{R}^{N\times\mathtt{pw}}, \begin{bmatrix} z^{(1)}_1 & \cdots & z^{(T)}_1 \\ \cdots & \cdots & \cdots \\ z^{(1)}_n & \cdots & z^{(T})_n \end{bmatrix} \mapsto \begin{bmatrix} z^{(T - \mathtt{pw})}_1 & \cdots & z^{(T)}_1 \\ \cdots & \cdots & \cdots \\ z^{(T - \mathtt{pw})}_n & \cdots & z^{(T})_n\end{bmatrix} \]

source

GeometricMachineLearning.assign_output_kernel! — Method

This should be used together with assign_batch_kernel!. It assigns the corresponding output (i.e. target).

source

GeometricMachineLearning.assign_q_and_p — Method

Allocates two new arrays q and p whose first dimension is half of that of the input x. This should also be supplied through the second argument N.

The output is a Tuple containing q and p.

source

GeometricMachineLearning.augment_zeros_kernel! — Method

Used for differentiating assignoutputestimate (this appears in the loss).

source

GeometricMachineLearning.build_v_reduced — Method

Builds the reduced vector field based on the full vector field for a Hamiltonian system. We derive the reduced vector field via the reduced Hamiltonian: $\tilde{H} := H\circ\Psi^\mathrm{dec}$. We then get

\[\mathbb{J}_{2n}\nabla_\xi\tilde{H} = \mathbb{J}_{2n}(\nabla\Psi^\mathrm{dec})^T\mathbb{J}_{2N}^T\mathbb{J}_{2N}\nabla_z{}H = \mathbb{J}_{2n}(\nabla\Psi^\mathrm{dec})^T\mathbb{J}_{2N}^T \begin{pmatrix} v(z) \\ f(z) \end{pmatrix} = \begin{pmatrix} - (\nabla_p\Psi_q)^Tf(z) + (\nabla_p\Psi_p)^Tv(z) \\ (\nabla_q\Psi_q)^Tf(z) - (\nabla_q\Psi_p)^Tv(z) \end{pmatrix}.\]

source

GeometricMachineLearning.cayley — Method

cayley(B::GrassmannLieAlgHorMatrix)

Compute the Cayley retraction of B and multiply it with E (the distinct element of the Stiefel manifold).

See cayley(::StiefelLieAlgHorMatrix).

source

GeometricMachineLearning.cayley — Method

cayley(B::StiefelLieAlgHorMatrix)

Compute the Cayley retraction of B and multiply it with E (the distinct element of the Stiefel manifold).

source

GeometricMachineLearning.cayley — Method

cayley(Y::Manifold, Δ)

Take as input an element of a manifold Y and a tangent vector in Δ in the corresponding tangent space and compute the Cayley retraction.

In different notation: take as input an element $x$ of $\mathcal{M}$ and an element of $T_x\mathcal{M}$ and return $\mathrm{Cayley}(v_x).$ For example:

Y = rand(StiefelManifold{Float64}, N, n)
Δ = rgrad(Y, rand(N, n))
cayley(Y, Δ)

See the docstring for rgrad for details on this function.

source

GeometricMachineLearning.compute_iterations — Method

This function gives iterations from the full dimension to the reduced dimension (i.e. the intermediate steps). The iterations are given in ascending order.

source

GeometricMachineLearning.compute_iterations_for_symplectic_system — Method

This function gives iterations from the full dimension to the reduced dimension (i.e. the intermediate steps). The iterations are given in ascending order. Only even steps are allowed here.

source

GeometricMachineLearning.compute_output_of_mha — Method

Applies MHA to an abstract matrix. This is the same independent of whether the input is added to the output or not.

source

GeometricMachineLearning.convert_input_and_batch_indices_to_array — Method

Takes the output of the batch functor and uses it to create the corresponding array (NamedTuples).

source

GeometricMachineLearning.convert_input_and_batch_indices_to_array — Method

Takes the output of the batch functor and uses it to create the corresponding array.

source

GeometricMachineLearning.crop_array_for_transformer_loss — Method

This crops the output array of the neural network so that it conforms with the output it should be compared to. This is needed for the transformer loss.

source

GeometricMachineLearning.custom_mat_mul — Method

Multiplies a matrix with a vector, a matrix or a tensor.

source

GeometricMachineLearning.decoder — Method

decoder(nn::NeuralNetwork{<:AutoEncoder})

Obtain the decoder from a AutoEncoder neural network.

The input is a neural network and the output is as well.

source

GeometricMachineLearning.decoder_layers_from_iteration — Method

Takes as input the autoencoder architecture and a vector of integers specifying the layer dimensions in the decoder. Has to return a tuple of AbstractExplicitLayers.

source

GeometricMachineLearning.draw_batch! — Method

This assigns the batch if the data are in form of a matrix.

source

GeometricMachineLearning.encoder — Method

encoder(nn::NeuralNetwork{<:AutoEncoder})

Obtain the encoder from a AutoEncoder neural network.

The input is a neural network and the output is as well.

source

GeometricMachineLearning.encoder_layers_from_iteration — Method

Takes as input the autoencoder architecture and a vector of integers specifying the layer dimensions in the encoder. Has to return a tuple of AbstractExplicitLayers.

source

GeometricMachineLearning.geodesic — Method

geodesic(B::GrassmannLieAlgHorMatrix)

Compute the geodesic of an element in GrassmannLieAlgHorMatrix.

See geodesic(::StiefelLieAlgHorMatrix).

source

GeometricMachineLearning.geodesic — Method

geodesic(B::StiefelLieAlgHorMatrix)

Compute the geodesic of an element in StiefelLieAlgHorMatrix.

Implementation

This is using a computationally efficient version of the matrix exponential. See GeometricMachineLearning.𝔄.

source

GeometricMachineLearning.geodesic — Method

geodesic(Y::Manifold, Δ)

Take as input an element of a manifold Y and a tangent vector in Δ in the corresponding tangent space and compute the geodesic (exponential map).

In different notation: take as input an element $x$ of $\mathcal{M}$ and an element of $T_x\mathcal{M}$ and return $\mathtt{geodesic}(x, v_x) = \exp(v_x).$ For example:

Y = rand(StiefelManifold{Float64}, N, n)
Δ = rgrad(Y, rand(N, n))
geodesic(Y, Δ)

See the docstring for rgrad for details on this function.

source

GeometricMachineLearning.global_rep — Method

global_rep(λY::GlobalSection{T, AT}, Δ::AbstractMatrix{T}) where {T, AT<:GrassmannManifold{T}}

Express Δ (an the tangent space of Y) as an instance of GrassmannLieAlgHorMatrix.

The method global_rep for GrassmannManifold is similar to that for StiefelManifold.

Examples

using GeometricMachineLearning
using GeometricMachineLearning: _round
import Random 

Random.seed!(123)

Y = rand(GrassmannManifold, 6, 3)
Δ = rgrad(Y, randn(6, 3))
λY = GlobalSection(Y)

_round(global_rep(λY, Δ); digits = 3)

# output

6×6 GrassmannLieAlgHorMatrix{Float64, Matrix{Float64}}:
  0.0     0.0     0.0     0.981  -2.058   0.4
  0.0     0.0     0.0    -0.424   0.733  -0.919
  0.0     0.0     0.0    -1.815   1.409   1.085
 -0.981   0.424   1.815   0.0     0.0     0.0
  2.058  -0.733  -1.409   0.0     0.0     0.0
 -0.4     0.919  -1.085   0.0     0.0     0.0

source

GeometricMachineLearning.global_rep — Method

global_rep(λY::GlobalSection{T, AT}, Δ::AbstractMatrix{T}) where {T, AT<:StiefelManifold{T}}

Express Δ (an the tangent space of Y) as an instance of StiefelLieAlgHorMatrix.

This maps an element from $T_Y\mathcal{M}$ to an element of $\mathfrak{g}^\mathrm{hor}$.

These two spaces are isomorphic where the isomorphism where the isomorphism is established through $\lambda(Y)\in{}G$ via:

\[T_Y\mathcal{M} \to \mathfrak{g}^{\mathrm{hor}}, \Delta \mapsto \lambda(Y)^{-1}\Omega(Y, \Delta)\lambda(Y).\]

Also see GeometricMachineLearning.Ω.

Examples

using GeometricMachineLearning
using GeometricMachineLearning: _round
import Random 

Random.seed!(123)

Y = rand(StiefelManifold, 6, 3)
Δ = rgrad(Y, randn(6, 3))
λY = GlobalSection(Y)

_round(global_rep(λY, Δ); digits = 3)

# output

6×6 StiefelLieAlgHorMatrix{Float64, SkewSymMatrix{Float64, Vector{Float64}}, Matrix{Float64}}:
  0.0     0.679   1.925   0.981  -2.058   0.4
 -0.679   0.0     0.298  -0.424   0.733  -0.919
 -1.925  -0.298   0.0    -1.815   1.409   1.085
 -0.981   0.424   1.815   0.0     0.0     0.0
  2.058  -0.733  -1.409   0.0     0.0     0.0
 -0.4     0.919  -1.085   0.0     0.0     0.0

Implementation

The function global_rep does in fact not perform the entire map $\lambda(Y)^{-1}\Omega(Y, \Delta)\lambda(Y)$ but only

\[\Delta \mapsto \mathrm{skew}(Y^T\Delta),\]

to get the small skew-symmetric matrix and

\[\Delta \mapsto (\lambda(Y)_{[1:N, n:N]}^T \Delta)_{[1:(N-n), 1:n]},\]

for the arbitrary matrix.

source

GeometricMachineLearning.global_section — Method

global_section(Y::GrassmannManifold)

Compute a matrix of size $N\times(N-n)$ whose columns are orthogonal to the columns in Y.

The method global_section for the Grassmann manifold is equivalent to that for the StiefelManifold (we represent the Grassmann manifold as an embedding in the Stiefel manifold).

See the documentation for global_section(Y::StiefelManifold{T}) where T.

source

GeometricMachineLearning.global_section — Method

global_section(Y::StiefelManifold)

Compute a matrix of size $N\times(N-n)$ whose columns are orthogonal to the columns in Y.

This matrix is also called $Y_\perp$ [6, 10, 11].

Examples

using GeometricMachineLearning
using GeometricMachineLearning: global_section
import Random

Random.seed!(123)

Y = StiefelManifold([1. 0.; 0. 1.; 0. 0.; 0. 0.])

round.(Matrix(global_section(Y)); digits = 3)

# output

4×2 Matrix{Float64}:
 0.0    -0.0
 0.0     0.0
 0.936  -0.353
 0.353   0.936

Further note that we convert the QRCompactWYQ object to a Matrix before we display it.

Implementation

The implementation is done with a QR decomposition (LinearAlgebra.qr!). Internally we do:

A = randn(N, N - n) # or the gpu equivalent
A = A - Y.A * (Y.A' * A)
qr!(A).Q

source

GeometricMachineLearning.init_optimizer_cache — Method

init_optimizer_cache(method, x)

Initialize the cache corresponding to the weights x for a specific method.

Implementation

Wrapper for the functions setup_adam_cache, setup_momentum_cache, setup_gradient_cache, setup_bfgs_cache. These appear outside of optimizer_caches.jl because the OptimizerMethods first have to be defined.

source

GeometricMachineLearning.init_optimizer_cache — Method

init_optimizer_cache(method, x)

Initialize the optimizer cache based on input x for the given method.

source

GeometricMachineLearning.initialize_hessian_inverse — Method

initialize_hessian_inverse(B)

Initialize the inverse of the Hessian for various arrays.

Implementation

This requires an implementation of a vectorization operation vec. This is important for custom arrays.

source

GeometricMachineLearning.map_index_for_symplectic_potential — Method

This assigns the right index for the symplectic potential. To be used with assign_ones_for_symplectic_potential_kernel!.

source

GeometricMachineLearning.mat_tensor_mul! — Method

mat_tensor_mul!(C, A, B)

Multiply the matrix A onto the tensor B from the left and store the result in C.

Also checks the bounds of the input arrays.

The function mat_tensor_mul calls mat_tensor_mul! internally.

source

GeometricMachineLearning.mat_tensor_mul! — Method

mat_tensor_mul!(C::AbstractArray{T, 3}, A::LowerTriangular{T}, B::AbstractArray{T, 3}) where T

Multiply the lower-triangular matrix A onto the tensor B from the left and store the result in C.

Also checks the bounds of the input arrays.

This performs an efficient multiplication based on the special structure of the lower-triangular matrix A.

source

GeometricMachineLearning.mat_tensor_mul! — Method

mat_tensor_mul!(C::AbstractArray{T, 3}, A::SkewSymMatrix{T}, B::AbstractArray{T, 3}) where T

Multiply skew-symmetric the matrix A onto the tensor B from the left and store the result in C.

Also checks the bounds of the input arrays.

This performs an efficient multiplication based on the special structure of the skew-symmetric matrix A.

source

GeometricMachineLearning.mat_tensor_mul! — Method

mat_tensor_mul!(C::AbstractArray{T, 3}, A::SymmetricMatrix{T}, B::AbstractArray{T, 3}) where T

Multiply the symmetric matrix A onto the tensor B from the left and store the result in C.

Also checks the bounds of the input arrays.

This performs an efficient multiplication based on the special structure of the symmetric matrix A.

source

GeometricMachineLearning.mat_tensor_mul! — Method

mat_tensor_mul!(C::AbstractArray{T, 3}, A::UpperTriangular{T}, B::AbstractArray{T, 3}) where T

Multiply the upper-triangular matrix A onto the tensor B from the left and store the result in C.

Also checks the bounds of the input arrays.

This performs an efficient multiplication based on the special structure of the upper-triangular matrix A.

source

GeometricMachineLearning.mat_tensor_mul — Method

Extend mat_tensor_mul to a multiplication by the adjoint of an element of StiefelManifold.

source

GeometricMachineLearning.mat_tensor_mul — Method

mat_tensor_mul(A::AbstractMatrix{T}, B::AbstractArray{T, 3}) where T

Multipliy the matrix A onto the tensor B from the left.

Internally this calls the inplace version mat_tensor_mul!.

Examples

using GeometricMachineLearning: mat_tensor_mul

B = [1 1 1; 1 1 1; 1 1 1;;; 2 2 2; 2 2 2; 2 2 2]
A = [3 0 0; 0 2 0; 0 0 1]

mat_tensor_mul(A, B)

# output

3×3×2 Array{Int64, 3}:
[:, :, 1] =
 3  3  3
 2  2  2
 1  1  1

[:, :, 2] =
 6  6  6
 4  4  4
 2  2  2

source

GeometricMachineLearning.mat_tensor_mul — Method

Extend mat_tensor_mul to a multiplication by an element of StiefelManifold.

source

GeometricMachineLearning.metric — Method

metric(Y::GrassmannManifold, Δ₁::AbstractMatrix, Δ₂::AbstractMatrix)

Compute the metric for vectors Δ₁ and Δ₂ at Y.

The representation of the Grassmann manifold is realized as a quotient space of the Stiefel manifold.

The metric for the Grassmann manifold is:

\[g^{Gr}_Y(\Delta_1, \Delta_2) = g^{St}_Y(\Delta_1, \Delta_2) = \mathrm{Tr}(\Delta_1^T (\mathbb{I} - Y Y^T) \Delta_2) = \mathrm{Tr}(\Delta_1^T \Delta_2).\]

source

GeometricMachineLearning.metric — Method

Implements the canonical Riemannian metric for the Stiefel manifold:

\[g_Y: (\Delta_1, \Delta_2) \mapsto \mathrm{tr}(\Delta_1^T(\mathbb{I} - \frac{1}{2}YY^T)\Delta_2).\]

It is called with:

Y::StiefelManifold
Δ₁::AbstractMatrix
Δ₂::AbstractMatrix

source

GeometricMachineLearning.number_of_batches — Method

Gives the number of batches. Inputs are of type DataLoader and Batch.

Here the big distinction is between data that are time-series like and data that are autoencoder like.

source

GeometricMachineLearning.onehotbatch — Method

One-hot-batch encoding of a vector of integers: $input\in\{0,1,\ldots,9\}^\ell$. The output is a tensor of shape $10\times1\times\ell$.

\[0 \mapsto \begin{bmatrix} 1 & 0 & \ldots & 0 \end{bmatrix}.\]

In more abstract terms: $i \mapsto e_i$.

source

GeometricMachineLearning.optimization_step! — Method

optimization_step!(o::Optimizer, λY::NamedTuple, ps::NamedTuple, dx::NamedTuple)

Optimize a neural network consisting of a single AbstractExplicitLayer.

source

GeometricMachineLearning.optimization_step! — Method

optimization_step!(o::Optimizer, λY::Chain, ps::Tuple, dx::Tuple)

Optimize a neural network built with Chain.

source

GeometricMachineLearning.optimization_step! — Method

optimization_step!(o, λY, ps, cache, dx)

Update the weights ps of a layer based on an Optimizer, a cache and first-order derivatives dx.

The derivatives dx here are usually obtained via an AD routine by differentiating a loss function, i.e. dx is $\nabla_xL$.

It is calling the function update! internally which has to be implemented for every OptimizerMethod.

source

GeometricMachineLearning.optimize_for_one_epoch! — Method

Optimize for an entire epoch. For this you have to supply:

an instance of the optimizer.
the neural network model
the parameters of the model
the data (in form of DataLoader)
in instance of Batch that contains batch_size (and optionally seq_length)

With the optional argument:

the loss, which takes the model, the parameters ps and an instance of DataLoader as input.

The output of optimize_for_one_epoch! is the average loss over all batches of the epoch:

\[output = \frac{1}{\mathtt{steps\_per\_epoch}}\sum_{t=1}^\mathtt{steps\_per\_epoch}loss(\theta^{(t-1)}).\]

This is done because any reverse differentiation routine always has two outputs: a pullback and the value of the function it is differentiating. In the case of zygote: loss_value, pullback = Zygote.pullback(ps -> loss(ps), ps) (if the loss only depends on the parameters).

source

GeometricMachineLearning.patch_index — Method

Based on coordinates i,j this returns the batch index (for MNIST data set for now).

source

GeometricMachineLearning.rgrad — Method

rgrad(Y::GrassmannManifold, e_grad::AbstractMatrix)

Compute the Riemannian gradient at $Y\in{}Gr(n, N)$.

These gradient have the property that they are orthogonal to the space spanned by $Y$.

The precise form of the mapping is:

\[\mathtt{rgrad}(Y, \nabla{}L) \mapsto \nabla{}L - YY^T\nabla{}L\]

Note the property $Y^T\mathrm{rgrad}(Y, \nabla{}L) = \mathbb{O}.$

Also see rgrad(::StiefelManifold, ::AbstractMatrix).

Examples

using GeometricMachineLearning

Y = GrassmannManifold([1 0 ; 0 1 ; 0 0; 0 0])
Δ = [1 2; 3 4; 5 6; 7 8]
rgrad(Y, Δ)

# output

4×2 Matrix{Int64}:
 0  0
 0  0
 5  6
 7  8

source

GeometricMachineLearning.rgrad — Method

rgrad(Y::StiefelManifold, e_grad::AbstractMatrix)

Compute the Riemannian gradient for the Stiefel manifold at $Y\in{}St(N,n)$ based on $\nabla{}L\in\mathbb{R}^{N\times{}n}$ (the Euclidean gradient).

The function computes the Riemannian gradient with respect to the canonical metric.

The precise form of the mapping is:

\[\mathtt{rgrad}(Y, \nabla{}L) \mapsto \nabla{}L - Y(\nabla{}L)^TY\]

Note the property $Y^T\mathrm{rgrad}(Y, \nabla{}L)\in\mathcal{S}_\mathrm{skew}(n).$

Examples

using GeometricMachineLearning

Y = StiefelManifold([1 0 ; 0 1 ; 0 0; 0 0])
Δ = [1 2; 3 4; 5 6; 7 8]
rgrad(Y, Δ)

# output

4×2 Matrix{Int64}:
 0  -1
 1   0
 5   6
 7   8

source

GeometricMachineLearning.solve! — Method

PSDArch does not require neural network training since it is a strictly linear operation that can be solved with singular value decomposition (SVD).

source

GeometricMachineLearning.split_and_flatten — Method

split_and_flatten takes a tensor as input and produces another one as output (essentially rearranges the input data in an intricate way) so that it can easily be processed with a transformer.

The optional arguments are:

patch_length: by default this is 7.
number_of_patches: by default this is 16.

source

GeometricMachineLearning.tensor_mat_mul! — Method

tensor_mat_mul!(C, A, B)

Multiply the matrix B onto the tensor A from the right and store the result in C.

Also checks the bounds of the input arrays.

The function tensor_mat_mul calls tensor_mat_mul! internally.

source

GeometricMachineLearning.tensor_mat_mul! — Method

mat_tensor_mul!(C::AbstractArray{T, 3}, B::AbstractArray{T, 3}, A::SymmetricMatrix{T}) where T

Multiply the symmetric matrix A onto the tensor B from the right and store the result in C.

Also checks the bounds of the input arrays.

This performs an efficient multiplication based on the special structure of the symmetric matrix A.

source

GeometricMachineLearning.tensor_mat_mul — Method

tensor_mat_mul(A::AbstractArray{T, 3}, B::AbstractArray{T}) where T

Multipliy the matrix B onto the tensor A from the right.

Internally this calls the inplace version tensor_mat_mul!.

Examples

using GeometricMachineLearning: tensor_mat_mul

A = [1 1 1; 1 1 1; 1 1 1;;; 2 2 2; 2 2 2; 2 2 2]
B = [3 0 0; 0 2 0; 0 0 1]

tensor_mat_mul(A, B)

# output

3×3×2 Array{Int64, 3}:
[:, :, 1] =
 3  2  1
 3  2  1
 3  2  1

[:, :, 2] =
 6  4  2
 6  4  2
 6  4  2

source

GeometricMachineLearning.tensor_mat_skew_sym_assign — Method

Takes as input:

Z::AbstractArray{T, 3}: A tensor that stores a bunch of time series.
A::AbstractMatrix: A matrix that is used to perform various scalar products.

For one of these time series the function performs the following computation:

\[ (z^{(i)}, z^{(j)}) \mapsto (z^{(i)})^TAz^{(j)} \text{ for } i > j.\]

The result of this are $n(n-2)\div2$ scalar products. These scalar products are written into a lower-triangular matrix and the final output of the function is a tensor of these lower-triangular matrices.

source

GeometricMachineLearning.tensor_mat_skew_sym_assign_kernel! — Method

A kernel that computes the weighted scalar products of all combinations of vectors in the matrix Z except where the two vectors are the same and writes the result into a tensor of skew symmetric matrices C.

source

GeometricMachineLearning.train! — Function

train!(...)

Perform a training of a neural networks on data using given method a training Method

Different ways of use:

train!(neuralnetwork, data, optimizer = GradientOptimizer(1e-2), training_method; nruns = 1000, batch_size = default(data, type), showprogress = false )

Arguments

neuralnetwork::LuxNeuralNetwork : the neural net work using LuxBackend
data : the data (see TrainingData)
optimizer = GradientOptimizer: the optimization method (see Optimizer)
training_method : specify the loss function used
nruns : number of iteration through the process with default value
batch_size : size of batch of data used for each step

source

GeometricMachineLearning.train! — Method

train!(neuralnetwork, data, optimizer, training_method; nruns = 1000, batch_size, showprogress = false )

Arguments

neuralnetwork::LuxNeuralNetwork : the neural net work using LuxBackend
data::AbstractTrainingData : the data
``

source

GeometricMachineLearning.within_patch_index — Method

Based on coordinates i,j this returns the index within the batch

source

GeometricMachineLearning.write_ones_kernel! — Method

Kernel that is needed for functions relating to SymmetricMatrix and SkewSymMatrix

source

GeometricMachineLearning.Ω — Method

Ω(Y::GrassmannManifold{T}, Δ::AbstractMatrix{T}) where T

Perform the canonical horizontal lift for the Grassmann manifold:

\[ \Delta \mapsto \Omega^{St}(Y, Δ),\]

where $\Omega^{St}$ is the canonical horizontal lift for the Stiefel manifold.

using GeometricMachineLearning
E = GrassmannManifold(StiefelProjection(5, 2))
Δ = [0. 0.; 0. 0.; 2. 3.; 4. 5.; 6. 7.]
GeometricMachineLearning.Ω(E, Δ)

# output

5×5 SkewSymMatrix{Float64, Vector{Float64}}:
 0.0  -0.0  -2.0  -4.0  -6.0
 0.0   0.0  -3.0  -5.0  -7.0
 2.0   3.0   0.0  -0.0  -0.0
 4.0   5.0   0.0   0.0  -0.0
 6.0   7.0   0.0   0.0   0.0

source

GeometricMachineLearning.Ω — Method

Ω(Y::StiefelManifold{T}, Δ::AbstractMatrix{T}) where T

Perform canonical horizontal lift for the Stiefel manifold:

\[ \Delta \mapsto (\mathbb{I} - \frac{1}{2}YY^T)\Delta{}Y^T - Y\Delta^T(\mathbb{I} - \frac{1}{2}YY^T).\]

Internally this performs

SkewSymMatrix(2 * (I(n) - .5 * Y * Y') * Δ * Y')

to save memory.

Examples

using GeometricMachineLearning
E = StiefelManifold(StiefelProjection(5, 2))
Δ = [0. -1.; 1. 0.; 2. 3.; 4. 5.; 6. 7.]
GeometricMachineLearning.Ω(E, Δ)

# output

5×5 SkewSymMatrix{Float64, Vector{Float64}}:
 0.0  -1.0  -2.0  -4.0  -6.0
 1.0   0.0  -3.0  -5.0  -7.0
 2.0   3.0   0.0  -0.0  -0.0
 4.0   5.0   0.0   0.0  -0.0
 6.0   7.0   0.0   0.0   0.0

Note that the output of Ω is a skew-symmetric matrix, i.e. an element of $\mathfrak{g}$.

source

GeometricMachineLearning.𝔄 — Method

𝔄(A)

Compute $\mathfrak{A}(A) := \sum_{n=1}^\infty \frac{1}{n!} (A)^{n-1}.$

Implementation

This uses a Taylor expansion that iteratively adds terms with

while norm(Aⁿ) > ε
    mul!(A_temp, Aⁿ, A)
    Aⁿ .= A_temp
    rmul!(Aⁿ, inv(n))

    𝔄 += B
    n += 1 
end

until the norm of Aⁿ becomes smaller than machine precision. The counter n in the above algorithm is initialized as 2 The matrices Aⁿ and 𝔄 are initialized as the identity matrix.

source

GeometricMachineLearning.𝔄 — Method

𝔄(B̂, B̄)

Compute $\mathfrak{A}(B', B'') := \sum_{n=1}^\infty \frac{1}{n!} ((B'')^TB')^{n-1}.$

This expression has the property $\mathbb{I} + B'\mathfrak{A}(B', B'')(B'')^T = \exp(B'(B'')^T).$

Examples

using GeometricMachineLearning
using GeometricMachineLearning: 𝔄
import Random
Random.seed!(123)

B = rand(StiefelLieAlgHorMatrix, 10, 2)
B̂ = hcat(vcat(.5 * B.A, B.B), vcat(one(B.A), zero(B.B)))
B̄ = hcat(vcat(one(B.A), zero(B.B)), vcat(-.5 * B.A, -B.B))

one(B̂ * B̄') + B̂ * 𝔄(B̂, B̄) * B̄' ≈ exp(Matrix(B))

# output

true

source