GeometricMachineLearning Library Functions
GeometricMachineLearning.QPTOAT
— TypeA union of two types:
const QPTOAT = Union{QPT, AbstractArray}
AbstractNeuralNetworks.Chain
— MethodChain
can also be called with a neural network as input.
AbstractNeuralNetworks.Chain
— MethodBuild a chain for an LASympnet for which init_upper_linear
is false
and init_upper_act
is false
.
AbstractNeuralNetworks.Chain
— MethodBuild a chain for an LASympnet for which init_upper_linear
is false
and init_upper_act
is true
.
AbstractNeuralNetworks.Chain
— MethodBuild a chain for an LASympnet for which init_upper_linear
is true
and init_upper_act
is false
.
AbstractNeuralNetworks.Chain
— MethodBuild a chain for an LASympnet for which init_upper_linear
is true
and init_upper_act
is true
.
Base.Matrix
— MethodMatrix(λY::GlobalSection)
Put λY
into matrix form.
This is not recommended if speed is important!
Use apply_section
and global_rep
instead!
GeometricMachineLearning.AbstractCache
— TypeAbstractCache has subtypes:
All of them can be initialized with providing an array (also supporting manifold types).
GeometricMachineLearning.AbstractLieAlgHorMatrix
— TypeAbstractLieAlgHorMatrix
is a supertype for various horizontal components of Lie algebras. We usually call this $\mathfrak{g}^\mathrm{hor}$.
GeometricMachineLearning.AbstractRetraction
— TypeAbstractRetraction is a type that comprises all retraction methods for manifolds. For every manifold layer one has to specify a retraction method that takes the layer and elements of the (global) tangent space.
GeometricMachineLearning.AbstractTriangular
— TypeSee UpperTriangular
and LowerTriangular
.
GeometricMachineLearning.ActivationLayer
— TypeActivationLayer
is the struct
corresponding to the constructors ActivationLayerQ
and ActivationLayerP
. See those for more information.
GeometricMachineLearning.ActivationLayerP
— TypeActivationLayerP(n, σ)
Make an activation layer of size $n$ and with activation $σ$ that only changes the $p$ component.
Performs:
\[\begin{pmatrix} q \\ p \end{pmatrix} \mapsto \begin{pmatrix} q \\ p + \mathrm{diag}(a)\sigma(q) \end{pmatrix}.\]
GeometricMachineLearning.ActivationLayerQ
— TypeActivationLayerQ(n, σ)
Make an activation layer of size $n$ and with activation $σ$ that only changes the $q$ component.
Performs:
\[\begin{pmatrix} q \\ p \end{pmatrix} \mapsto \begin{pmatrix} q + \mathrm{diag}(a)\sigma(p) \\ p \end{pmatrix}.\]
GeometricMachineLearning.AdamCache
— TypeAdamCache(Y)
Store the first and second moment for Y
(initialized as zeros).
First and second moments are called B₁
and B₂
.
If the cache is called with an instance of a homogeneous space, e.g. the StiefelManifold
$St(n,N)$ it initializes the moments as elements of $\mathfrak{g}^\mathrm{hor}$ (StiefelLieAlgHorMatrix
).
Examples
using GeometricMachineLearning
Y = rand(StiefelManifold, 5, 3)
AdamCache(Y).B₁
# output
5×5 StiefelLieAlgHorMatrix{Float64, SkewSymMatrix{Float64, Vector{Float64}}, Matrix{Float64}}:
0.0 -0.0 -0.0 -0.0 -0.0
0.0 0.0 -0.0 -0.0 -0.0
0.0 0.0 0.0 -0.0 -0.0
0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0
GeometricMachineLearning.AdamOptimizer
— TypeAdamOptimizer(η, ρ₁, ρ₂, δ)
Make an instance of the Adam Optimizer.
Here the cache consists of first and second moments that are updated as
\[B_1 \gets ((\rho_1 - \rho_1^t)/(1 - \rho_1^t))\cdot{}B_1 + (1 - \rho_1)/(1 - \rho_1^t)\cdot{}\nabla{}L,\]
and
\[B_2 \gets ((\rho_2 - \rho_1^t)/(1 - \rho_2^t))\cdot{}B_2 + (1 - \rho_2)/(1 - \rho_2^t)\cdot\nabla{}L\odot\nabla{}L.\]
The final velocity is computed as:
\[\mathrm{velocity} \gets -\eta{}B_1/\sqrt{B_2 + \delta}.\]
Implementation
The velocity is stored in the input to save memory.
Algorithm and suggested defaults are taken from [20, page 301].
GeometricMachineLearning.AdamOptimizerWithDecay
— TypeAdamOptimizerWithDecay(n_epochs, η₁=1f-2, η₂=1f-6, ρ₁=9f-1, ρ₂=9.9f-1, δ=1f-8)
Make an instance of the Adam Optimizer with weight decay.
All except the first argument (the number of epochs) have defaults.
The difference to the standard AdamOptimizer
is that we change the learning reate $\eta$ in each step. Apart from the time dependency of $\eta$ the two algorithms are however equivalent! $\eta(0)$ starts with a high value $\eta_1$ and then exponentially decrease until it reaches $\eta_2$ with
\[ \eta(t) = \gamma^t\eta_1,\]
where $\gamma = \exp(\log(\eta_1 / \eta_2) / \mathtt{n\_epochs}).$
GeometricMachineLearning.AutoEncoder
— TypeAn autoencoder [20] is a neural network consisting of an encoder $\Psi^e$ and a decoder $\Psi^d$. In the simplest case they are trained on some data set $\mathcal{D}$ to reduce the following error:
\[||\Psi^d\circ\Psi^e(\mathcal{D}) - \mathcal{D}||,\]
which we call the reconstruction error or autoencoder error (see the docs for AutoEncoderLoss) and $||\cdot||$ is some norm.
Implementation
Abstract AutoEncoder
type. If a custom <:AutoEncoder
architecture is implemented it should have the fields full_dim
, reduced_dim
, n_encoder_blocks
and n_decoder_blocks
. Further the routines encoder
, decoder
, encoder_parameters
and decoder_parameters
should be extended.
GeometricMachineLearning.AutoEncoderLoss
— TypeThis loss should always be used together with a neural network of type AutoEncoder
(and it is also the default for training such a network).
It simply computes:
\[\mathtt{AutoEncoderLoss}(nn\mathtt{::Loss}, input) = ||nn(input) - input||.\]
GeometricMachineLearning.BFGSCache
— TypeBFGSCache(B)
Make the cache for the BFGS optimizer based on the array B
.
It stores an array for the gradient of the previous time step B
and the inverse of the Hessian matrix H
.
The cache for the inverse of the Hessian is initialized with the idendity. The cache for the previous gradient information is initialized with the zero vector.
Note that the cache for H
is changed iteratively, whereas the cache for B
is newly assigned at every time step.
GeometricMachineLearning.BFGSOptimizer
— TypeBFGSOptimizer(η, δ)
Make an instance of the Broyden-Fletcher-Goldfarb-Shanno (BFGS) optimizer.
η
is the learning rate. δ
is a stabilization parameter.
GeometricMachineLearning.Batch
— TypeBatch
is a struct whose functor acts on an instance of DataLoader
to produce a sequence of training samples for training for one epoch.
The Constructor
The constructor for Batch
is called with:
batch_size::Int
seq_length::Int
(optional)prediction_window::Int
(optional)
The first one of these arguments is required; it indicates the number of training samples in a batch. If we deal with time series data then we can additionaly supply a sequence length and a prediction window as input arguments to Batch
. These indicate the number of input vectors and the number of output vectors.
The functor
An instance of Batch
can be called on an instance of DataLoader
to produce a sequence of samples that contain all the input data, i.e. for training for one epoch. The output of applying batch:Batch
to dl::DataLoader
is a tuple of vectors of integers. Each of these vectors contains two integers: the first is the time index and the second one is the parameter index.
GeometricMachineLearning.BiasLayer
— TypeA bias layer that does nothing more than add a vector to the input. This is needed for LA-SympNets.
GeometricMachineLearning.Classification
— TypeClassification Layer that takes a matrix as an input and returns a vector that is used for MNIST classification.
It has the following arguments:
M
: input dimensionN
: output dimensionactivation
: the activation function
And the following optional argument:
average
: If this is set totrue
, then the output is computed as $\frac{1}{N}\sum_{i=1}^N[input]_{\bullet{}i}$. If set tofalse
(the default) it picks the last column of the input.
GeometricMachineLearning.ClassificationTransformer
— TypeThis is a transformer neural network for classification purposes. At the moment this is only used for training on MNIST, but can in theory be used for any classification problem.
It has to be called with a DataLoader
that stores an input and an output tensor. The optional arguments are:
n_heads
: The number of heads in theMultiHeadAttention
(mha) layers. Default:7
.n_layers
: The number of transformer layers. Default:16
.activation
: The activation function. Default:softmax
.Stiefel
: Wheter the matrices in the mha layers are on the Stiefel manifold.add_connection
: Whether the input is appended to the output of the mha layer. (skip connection)
GeometricMachineLearning.DataLoader
— TypeDataLoader(data)
Make an instance based on a data set.
This is designed such to make training convenient.
Constructor
The data loader can be called with various inputs:
- A single vector: If the data loader is called with a single vector (and no other arguments are given), then this is interpreted as an autoencoder problem, i.e. the second axis indicates parameter values and/or time steps and the system has a single degree of freedom (i.e. the system dimension is one).
- A single matrix: If the data loader is called with a single matrix (and no other arguments are given), then this is interpreted as an autoencoder problem, i.e. the first axis is assumed to indicate the degrees of freedom of the system and the second axis indicates parameter values and/or time steps.
- A single tensor: If the data loader is called with a single tensor, then this is interpreted as an integration problem with the second axis indicating the time step and the third one indicating the parameters.
- A tensor and a vector: This is a special case (MNIST classification problem). For the MNIST problem for example the input are $n_p$ matrices (first input argument) and $n_p$ integers (second input argument).
- A
NamedTuple
with fieldsq
andp
: TheNamedTuple
contains (i) two matrices or (ii) two tensors. - An
EnsembleSolution
: TheEnsembleSolution
typically comes fromGeometricProblems
.
When we supply a single vector or a single matrix as input to DataLoader
and further set autoencoder = false
(keyword argument), then the data are stored as an integration problem and the second axis is assumed to indicate time steps.
Fields of DataLoader
The fields of the DataLoader
struct are the following:
input
: The input data with axes (i) system dimension, (ii) number of time steps and (iii) number of parameters.output
: The tensor that contains the output (supervised learning) - this may be of typeNothing
if the constructor is only called with one tensor (unsupervised learning).input_dim
: The dimension of the system, i.e. what is taken as input by a regular neural network.input_time_steps
: The length of the entire time series (length of the second axis).n_params
: The number of parameters that are present in the data set (length of third axis)output_dim
: The dimension of the output tensor (first axis). Ifoutput
is of typeNothing
, then this is also of typeNothing
.output_time_steps
: The size of the second axis of the output tensor. Ifoutput
is of typeNothing
, then this is also of typeNothing
.
Implementation
Even though DataLoader
can be called with inputs of various forms, internally it always stores tensors with three axes.
using GeometricMachineLearning
data = [1 2 3; 4 5 6]
dl = DataLoader(data)
dl.input
# output
[ Info: You have provided a matrix as input. The axes will be interpreted as (i) system dimension and (ii) number of parameters.
2×1×3 Array{Int64, 3}:
[:, :, 1] =
1
4
[:, :, 2] =
2
5
[:, :, 3] =
3
6
GeometricMachineLearning.DataLoader
— MethodDataLoader(data)
Make an instance based on a data set.
This is designed such to make training convenient.
Constructor
The data loader can be called with various inputs:
- A single vector: If the data loader is called with a single vector (and no other arguments are given), then this is interpreted as an autoencoder problem, i.e. the second axis indicates parameter values and/or time steps and the system has a single degree of freedom (i.e. the system dimension is one).
- A single matrix: If the data loader is called with a single matrix (and no other arguments are given), then this is interpreted as an autoencoder problem, i.e. the first axis is assumed to indicate the degrees of freedom of the system and the second axis indicates parameter values and/or time steps.
- A single tensor: If the data loader is called with a single tensor, then this is interpreted as an integration problem with the second axis indicating the time step and the third one indicating the parameters.
- A tensor and a vector: This is a special case (MNIST classification problem). For the MNIST problem for example the input are $n_p$ matrices (first input argument) and $n_p$ integers (second input argument).
- A
NamedTuple
with fieldsq
andp
: TheNamedTuple
contains (i) two matrices or (ii) two tensors. - An
EnsembleSolution
: TheEnsembleSolution
typically comes fromGeometricProblems
.
When we supply a single vector or a single matrix as input to DataLoader
and further set autoencoder = false
(keyword argument), then the data are stored as an integration problem and the second axis is assumed to indicate time steps.
GeometricMachineLearning.DataLoader
— MethodDataLoader(data::AbstractMatrix)
Make an instance of DataLoader
based on a matrix.
Arguments
This has an addition keyword argument autoencoder
, which is set to true
by default.
GeometricMachineLearning.DataLoader
— MethodConstructor for EnsembleSolution
from package GeometricSolutions
with field q
.
GeometricMachineLearning.DataLoader
— MethodConstructor for EnsembleSolution
form package GeometricSolutions
with fields q
and p
.
GeometricMachineLearning.Decoder
— TypeAbstract Decoder
type.
See
Implementation
If a custom <:Decoder
architecture is implemented it should have the fields full_dim
, reduced_dim
and n_decoder_blocks
.
GeometricMachineLearning.Encoder
— TypeAbstract Encoder
type.
See
Implementation
If a custom <:Encoder
architecture is implemented it should have the fields full_dim
, reduced_dim
and n_encoder_blocks
.
GeometricMachineLearning.FeedForwardLoss
— TypeFeedForwardLoss()
Make an instance of a loss for feedforward neural networks.
This doesn't have any parameters.
GeometricMachineLearning.GSympNet
— TypeGSympNet(d)
Make a $G$-SympNet with dimension $d.$
There exists an additional constructor that can be called by supplying an instance of DataLoader
.
Arguments
`Keyword arguments are:
upscaling_dimension::Int
: The upscaling dimension of the gradient layer. See the documentation forGradientLayerQ
andGradientLayerP
for further explanation. The default is2*dim
.n_layers::Int
: The number of layers (i.e. the total number ofGradientLayerQ
andGradientLayerP
). The default is 2.activation
: The activation function that is applied. By default this istanh
.init_upper::Bool
: Initialize the gradient layer so that it first modifies the $q$-component. The default istrue
.
GeometricMachineLearning.GlobalSection
— TypeGlobalSection(Y::AbstractMatrix)
Construct a global section for Y
.
A global section $\lambda$ is a mapping from a homogeneous space $\mathcal{M}$ to the corresponding Lie group $G$ such that
\[\lambda(Y)E = Y,\]
Also see apply_section
and global_rep
.
Implementation
For an implementation of GlobalSection
for a custom array (especially manifolds), the function global_section
has to be generalized.
GeometricMachineLearning.GradientCache
— TypeGradientCache(Y)
Do not store anything.
The cache for the GradientOptimizer
does not consider past information.
GeometricMachineLearning.GradientLayer
— TypeGradientLayer
is the struct
corresponding to the constructors GradientLayerQ
and GradientLayerP
. See those for more information.
GeometricMachineLearning.GradientLayerP
— TypeGradientLayerP(n, upscaling_dimension, activation)
Make an instance of a gradient-$p$ layer.
The gradient layer that changes the $p$ component. It is of the form:
\[\begin{bmatrix} \mathbb{I} & \mathbb{O} \\ \nabla{}V & \mathbb{I} \end{bmatrix},\]
with $V(p) = \sum_{i=1}^Ma_i\Sigma(\sum_jk_{ij}q_j+b_i)$, where $\Sigma$ is the antiderivative of the activation function $\sigma$ (one-layer neural network). We refer to $M$ as the upscaling dimension. Such layers are by construction symplectic.
GeometricMachineLearning.GradientLayerQ
— TypeGradientLayerQ(n, upscaling_dimension, activation)
Make an instance of a gradient-$q$ layer.
The gradient layer that changes the $q$ component. It is of the form:
\[\begin{bmatrix} \mathbb{I} & \nabla{}V \\ \mathbb{O} & \mathbb{I} \end{bmatrix},\]
with $V(p) = \sum_{i=1}^Ma_i\Sigma(\sum_jk_{ij}p_j+b_i)$, where $\Sigma$ is the antiderivative of the activation function $\sigma$ (one-layer neural network). We refer to $M$ as the upscaling dimension. Such layers are by construction symplectic.
GeometricMachineLearning.GradientOptimizer
— TypeGradientOptimizer(η)
Make an instance of a gradient optimizer.
This is the simplest neural network optimizer. It has no cache and computes the final velocity as:
\[ \mathrm{velocity} \gets - \eta\nabla_\mathrm{weight}L.\]
Implementation
The operations are done as memory efficiently as possible. This means the provided $\nabla_WL$ is mutated via:
rmul!(∇L, -method.η)
GeometricMachineLearning.GrassmannLayer
— TypeDefines a layer that performs simple multiplication with an element of the Grassmann manifold.
GeometricMachineLearning.GrassmannLieAlgHorMatrix
— TypeGrassmannLieAlgHorMatrix(B::AbstractMatrix{T}, N::Integer, n::Integer) where T
Build an instance of GrassmannLieAlgHorMatrix
based on an arbitrary matrix B
of size $(N-n)\times{}n$.
GrassmannLieAlgHorMatrix
is the horizontal component of the Lie algebra of skew-symmetric matrices (with respect to the canonical metric). The projection here is: $\pi:S \to SE/\sim$ where
\[E = \begin{pmatrix} \mathbb{I}_{n} \\ \mathbb{O}_{(N-n)\times{}n} \end{pmatrix},\]
and the equivalence relation is
\[V_1 \sim V_2 \iff \exists A\in\mathcal{S}_\mathrm{skew}(n) \text{such that } V_2 = V_1 + \begin{pmatrix} A \\ \mathbb{O} \end{pmatrix}\]
An element of GrassmannLieAlgMatrix takes the form:
\[\begin{pmatrix} \bar{\mathbb{O}} & B^T \\ B & \mathbb{O} \end{pmatrix},\]
where $\bar{\mathbb{O}}\in\mathbb{R}^{n\times{}n}$ and $\mathbb{O}\in\mathbb{R}^{(N - n)\times{}n}.$
GeometricMachineLearning.GrassmannLieAlgHorMatrix
— MethodGrassmannLieAlgHorMatrix(D::AbstractMatrix, n::Integer)
Take a big matrix as input and build an instance of GrassmannLieAlgHorMatrix
belonging to the GrassmannManifold $Gr(n, N)$ where $N$ is the number of rows of D
.
If the constructor is called with a big $N\times{}N$ matrix, then the projection is performed the following way:
\[\begin{pmatrix} A & B_1 \\ B_2 & D \end{pmatrix} \mapsto \begin{pmatrix} \bar{\mathbb{O}} & -B_2^T \\ B_2 & \mathbb{O} \end{pmatrix}.\]
This can also be seen as the operation:
\[D \mapsto \Omega(E, DE - EE^TDE),\]
where $\Omega$ is the horizontal lift GeometricMachineLearning.Ω
.
GeometricMachineLearning.GrassmannManifold
— TypeThe GrassmannManifold
is based on the StiefelManifold
.
GeometricMachineLearning.HRedSys
— TypeHRedSys
computes the reconstructed dynamics in the full system based on the reduced one. Optionally it can be compared to the FOM solution.
It can be called using the following constructor: HRedSys(N, n; encoder, decoder, v_full, f_full, v_reduced, f_reduced, parameters, tspan, tstep, ics, projection_error)
where
encoder
: a function $\mathbb{R}^{2N}\mapsto{}\mathbb{R}^{2n}$decoder
: a (differentiable) function $\mathbb{R}^{2n}\mapsto\mathbb{R}^{2N}$v_full
: a (differentiable) mapping defined the same way as in GeometricIntegrators.f_full
: a (differentiable) mapping defined the same way as in GeometricIntegrators.v_reduced
: a (differentiable) mapping defined the same way as in GeometricIntegrators.f_reduced
: a (differentiable) mapping defined the same way as in GeometricIntegrators.parameters
: a NamedTuple that parametrizes the vector fields (the same for fullvectorfield and reducedvectorfield)tspan
: a tuple(t₀, tₗ)
that specifies start and end point of the time interval over which integration is performed.tstep
: the time stepics
: the initial condition for the big system.projection_error
: the error $||M - \mathcal{R}\circ\mathcal{P}(M)||$ where $M$ is the snapshot matrix; $\mathcal{P}$ and $\mathcal{R}$ are the reduction and reconstruction respectively.
GeometricMachineLearning.LASympNet
— TypeLASympNet(d)
Make an $LA$-SympNet with dimension $d.$
There exists an additional constructor that can be called by supplying an instance of DataLoader
.
Arguments
Keyword arguments are:
depth::Int
: The number of linear layers that are applied. The default is 5.nhidden::Int
: The number of hidden layers (i.e. layers that are not input or output layers). The default is 2.activation
: The activation function that is applied. By default this istanh
.init_upper_linear::Bool
: Initialize the linear layer so that it first modifies the $q$-component. The default istrue
.init_upper_act::Bool
: Initialize the activation layer so that it first modifies the $q$-component. The default istrue
.
GeometricMachineLearning.LayerWithManifold
— TypeLayerWithManifold
is a subtype of AbstractExplicitLayer
that contains manifolds as weights.
GeometricMachineLearning.LayerWithOptionalManifold
— TypeLayerWithOptionalManifold
is a subtype of AbstractExplicitLayer
that can contain manifolds as weights.
GeometricMachineLearning.LinearLayer
— TypeLinearLayer
is the struct
corresponding to the constructors LinearLayerQ
and LinearLayerP
. See those for more information.
Implementation
LinearLayer
uses the custom matrix SymmetricMatrix
for an especially efficient implementation.
GeometricMachineLearning.LinearLayerP
— TypeLinearLayerP(n)
Make a linear layer of dimension $n\times{}n$ that only changes the $p$ component.
Equivalent to a left multiplication by the matrix:
\[\begin{pmatrix} \mathbb{I} & \mathbb{O} \\ B & \mathbb{I} \end{pmatrix}, \]
where $B$ is a symmetric matrix.
GeometricMachineLearning.LinearLayerQ
— TypeLinearLayerQ(n)
Make a linear layer of dimension $n\times{}n$ that only changes the $q$ component.
Equivalent to a left multiplication by the matrix:
\[\begin{pmatrix} \mathbb{I} & B \\ \mathbb{O} & \mathbb{I} \end{pmatrix}, \]
where $B$ is a symmetric matrix.
GeometricMachineLearning.LinearSymplecticAttention
— TypeImplements the linear symplectic attention layers. Analogous to GradientLayer
it performs mappings that only change the $Q$ or the $P$ part. For more information see LinearSymplecticAttentionQ
and LinearSymplecticAttentionP
.
Constructor
For the constructors simply call
LinearSymplecticAttentionQ(sys_dim, seq_length)
or
LinearSymplecticAttentionP(sys_dim, seq_length)
where sys_dim
is the system dimension and seq_length
is the sequence length.
GeometricMachineLearning.LinearSymplecticAttentionP
— TypePerforms:
\[\begin{pmatrix} Q \\ P \end{pmatrix} \mapsto \begin{pmatrix} Q + \nabla_PF \\ P \end{pmatrix},\]
where $Q,\, P\in\mathbb{R}^{n\times{}T}$ and $F(P) = \frac{1}{2}\mathrm{Tr}(P A P^T)$.
GeometricMachineLearning.LinearSymplecticAttentionQ
— TypePerforms:
\[\begin{pmatrix} Q \\ P \end{pmatrix} \mapsto \begin{pmatrix} Q + \nabla_PF \\ P \end{pmatrix},\]
where $Q,\, P\in\mathbb{R}^{n\times{}T}$ and $F(P) = \frac{1}{2}\mathrm{Tr}(P A P^T)$.
GeometricMachineLearning.LinearSymplecticTransformer
— TypeRealizes the linear Symplectic Transformer.
Constructor:
The constructor is called with the following arguments
dim::Int
: System dimensionseq_length::Int
: Number of time steps that the transformer considers.
Optional keyword arguments:
n_sympnet::Int=2
: The number of sympnet layers in the transformer.upscaling_dimension::Int=2*dim
: The upscaling that is done by the gradient layer.L::Int=1
: The number of transformer units.activation=tanh
: The activation function for the SympNet layers.init_upper::Bool=true
: Specifies if the first layer is a $Q$-type layer (init_upper=true
) or if it is a $P$-type layer (init_upper=false
).
GeometricMachineLearning.LowerTriangular
— TypeLowerTriangular(S::AbstractVector, n::Int)
Build a lower-triangular matrix from a vector.
A lower-triangular matrix is an $n\times{}n$ matrix that has ones on the diagonal and zeros on the upper triangular.
The data are stored in a vector $S$ similarly to other matrices. See UpperTriangular
, SkewSymMatrix
and SymmetricMatrix
.
The struct two fields: S
and n
. The first stores all the entries of the matrix in a sparse fashion (in a vector) and the second is the dimension $n$ for $A\in\mathbb{R}^{n\times{}n}$.
Examples
using GeometricMachineLearning
S = [1, 2, 3, 4, 5, 6]
LowerTriangular(S, 4)
# output
4×4 LowerTriangular{Int64, Vector{Int64}}:
0 0 0 0
1 0 0 0
2 3 0 0
4 5 6 0
GeometricMachineLearning.LowerTriangular
— MethodLowerTriangular(A::AbstractMatrix)
Build a lower-triangular matrix from a matrix.
This is done by taking the lower left of that matrix.
Examples
using GeometricMachineLearning
M = [1 2 3 4; 5 6 7 8; 9 10 11 12; 13 14 15 16]
LowerTriangular(M)
# output
4×4 LowerTriangular{Int64, Vector{Int64}}:
0 0 0 0
5 0 0 0
9 10 0 0
13 14 15 0
GeometricMachineLearning.Manifold
— TypeA manifold in GeometricMachineLearning
is a sutype of AbstractMatrix
. All manifolds are matrix manifolds and therefore stored as matrices. More details can be found in the docstrings for the StiefelManifold
and the GrassmannManifold
.
GeometricMachineLearning.ManifoldLayer
— TypeThis defines a manifold layer that only has one matrix-valued manifold $A$ associated with it does $x\mapsto{}Ax$.
GeometricMachineLearning.MomentumCache
— TypeMomentumCache(Y)
Store the moment for Y
(initialized as zeros).
The moment is called B
.
If the cache is called with an instance of a homogeneous space, e.g. the StiefelManifold
$St(n,N)$ it initializes the moments as elements of $\mathfrak{g}^\mathrm{hor}$ (StiefelLieAlgHorMatrix
).
See AdamCache
.
GeometricMachineLearning.MomentumOptimizer
— TypeMomentumOptimizer(η, α)
Make an instance of the momentum optimizer.
The momentum optimizer is similar to the GradientOptimizer
. It however has a nontrivial cache that stores past history (see MomentumCache
). The cache is updated via:
\[ B^{\mathrm{cache}} \gets \alpha{}B^{\mathrm{cache}} + \nabla_\mathrm{weights}L\]
and then the final velocity is computed as
\[ \mathrm{velocity} \gets - \eta{}B^{\mathrm{cache}}.\]
Or the riemannian manifold equivalent, if applicable.
Implementation
To save memory the velocity is stored in the input $\nabla_WL$. This is similar to the case of the GradientOptimizer
.
GeometricMachineLearning.MultiHeadAttention
— TypeMultiHeadAttention (MHA) serves as a preprocessing step in the transformer. It reweights the input vectors bases on correlations within those data.
Constructor
Takes input arguments:
dim::Int
: The system dimensionn_heads::Int
: The number of heads.Stiefel::Bool=true
(keyword argument): whether the weights should be put on the Stiefel manifold.retraction::AbstractRetraction
(keyword argument): what kind of retraction should be used. By default this is the geodesic retraction.add_connection::Bool=true
(keyword argument): determines if the input should be added to the output for the final result.
GeometricMachineLearning.NetworkLoss
— TypeAn abstract type for all the neural network losses. If you want to implement CustomLoss <: NetworkLoss
you need to define a functor:
(loss::CustomLoss)(model, ps, input, output)
where model
is an instance of an AbstractExplicitLayer
or a Chain
and ps
the parameters.
GeometricMachineLearning.NeuralNetworkIntegrator
— TypeThis is a super type of various neural network architectures such as SympNet
and ResNet
whose purpose is to approximate the flow of an ordinary differential equation (ODE).
GeometricMachineLearning.NonLinearSymplecticDecoder
— TypeThe NonLinearSymplecticDecoder
.
This is typically generated by calling decoder
on an instance of SymplecticAutoencoder
.
GeometricMachineLearning.NonLinearSymplecticEncoder
— TypeThe NonLinearSymplecticEncoder
.
This is typically generated by calling encoder
on an instance of SymplecticAutoencoder
.
GeometricMachineLearning.Optimizer
— TypeOptimizer(method, cache, step, retraction)
Store the method
(e.g. AdamOptimizer
with corresponding hyperparameters), the cache
(e.g. AdamCache
), the optimization step and the retraction.
It takes as input an optimization method and the parameters of a network.
For technical reasons we first specify an OptimizerMethod
that stores all the hyperparameters of the optimizer.
GeometricMachineLearning.Optimizer
— MethodA functor for Optimizer
. It is called with: - nn::NeuralNetwork
- dl::DataLoader
- batch::Batch
- n_epochs::Int
- loss
The last argument is a function through which Zygote
differentiates. This argument is optional; if it is not supplied GeometricMachineLearning
defaults to an appropriate loss for the DataLoader
.
GeometricMachineLearning.Optimizer
— MethodOptimizer(method, nn::NeuralNetwork)
Allocate the cache for a specific method
and a NeuralNetwork
for an instance of Optimizer
.
Internally this calls Optimizer(method, nn.params)
.
Typically the Optimizer is not initialized with the network parameters, but instead with a NeuralNetwork struct.
GeometricMachineLearning.Optimizer
— MethodOptimizer(method, nn_params)
Allocate the cache for a specific method
and nn_params
for an instance of Optimizer
.
Internally this calls init_optimizer_cache
.
Arguments
The optional keyword argument is the retraction. By default this is cayley
.
GeometricMachineLearning.OptimizerMethod
— TypeEach Optimizer
has to be called with an OptimizerMethod
. This specifies how the neural network weights are updated in each optimization step.
GeometricMachineLearning.PSDArch
— TypeThe architecture
Proper symplectic decomposition (PSD) can be seen as a SymplecticAutoencoder for which the decoder and the encoder are both PSD-like matrices (see the docs for PSDLayer.
Training
For optimizing the parameters in this architecture no neural network training is necessary (see the docs for solve!).
The constructor
The constructor only takes two arguments as input:
full_dim::Integer
reduced_dim::Integer
GeometricMachineLearning.PSDLayer
— TypeThis is a PSD-like layer used for symplectic autoencoders. One layer has the following shape:
\[A = \begin{bmatrix} \Phi & \mathbb{O} \\ \mathbb{O} & \Phi \end{bmatrix},\]
where $\Phi$ is an element of the Stiefel manifold $St(n, N)$.
The constructor of PSDLayer is called by PSDLayer(M, N; retraction=retraction)
:
M
is the input dimension.N
is the output dimension.retraction
is an instance of a struct with supertypeAbstractRetraction
. The only options at the moment areGeodesic()
andCayley()
.
GeometricMachineLearning.QPT
— TypeThe type for data in $(q, p)$ coordinates.
GeometricMachineLearning.ReducedLoss
— TypeReducedLoss(encoder, decoder)
Make an instance of ReducedLoss
based on an Encoder
and a Decoder
.
This loss should be used together with a NeuralNetworkIntegrator
or TransformerIntegrator
.
The loss is computed as:
\[\mathrm{loss}_{\mathcal{E}, \mathcal{D}}(\mathcal{NN}, \mathrm{input}, \mathrm{output}) = ||\mathcal{D}(\mathcal{NN}(\mathcal{E}(\mathrm{input}))) - \mathrm{output}||,\]
where $\mathcal{E}$ is the Encoder
, $\mathcal{D}$ is the Decoder
. $\mathcal{NN}$ is the neural network we compute the loss of.
GeometricMachineLearning.ReducedLoss
— MethodReducedLoss(autoencoder)
Make an instance of ReducedLoss
based on a neural network of type AutoEncoder
.
GeometricMachineLearning.ResNet
— TypeA ResNet is a neural network that realizes a mapping of the form: $x = \mathcal{NN}(x) + x$, so the input is again added to the output (a so-called add connection). In GeometricMachineLearning
the specific ResNet that we use consists of a series of simple ResNetLayer
s.
GeometricMachineLearning.ResNetLayer
— TypeThe ResNetLayer
is a simple feedforward neural network to which we add the input after applying it, i.e. it realizes $x \mapsto x + \sigma(Ax + b)$.
GeometricMachineLearning.SkewSymMatrix
— TypeSkewSymMatrix(S::AbstractVector, n::Integer)
Instantiate a skew-symmetric matrix with information stored in vector S
.
A skew-symmetric matrix $A$ is a matrix $A^T = -A$.
Internally the struct
saves a vector $S$ of size $n(n-1)\div2$. The conversion is done the following way:
\[[A]_{ij} = \begin{cases} 0 & \text{if $i=j$} \\ S[( (i-2) (i-1) ) \div 2 + j] & \text{if $i>j$}\\ S[( (j-2) (j-1) ) \div 2 + i] & \text{else}. \end{cases}\]
Also see SymmetricMatrix
, LowerTriangular
and UpperTriangular
.
Examples
using GeometricMachineLearning
S = [1, 2, 3, 4, 5, 6]
SkewSymMatrix(S, 4)
# output
4×4 SkewSymMatrix{Int64, Vector{Int64}}:
0 -1 -2 -4
1 0 -3 -5
2 3 0 -6
4 5 6 0
GeometricMachineLearning.SkewSymMatrix
— MethodSkewSymMatrix(A::AbstractMatrix)
Perform 0.5 * (A - A')
and store the matrix in an efficient way (as a vector with $n(n-1)/2$ entries).
If the constructor is called with a matrix as input it returns a skew-symmetric matrix via the projection:
\[A \mapsto \frac{1}{2}(A - A^T).\]
Examples
using GeometricMachineLearning
M = [1 2 3 4; 5 6 7 8; 9 10 11 12; 13 14 15 16]
SkewSymMatrix(M)
# output
4×4 SkewSymMatrix{Float64, Vector{Float64}}:
0.0 -1.5 -3.0 -4.5
1.5 0.0 -1.5 -3.0
3.0 1.5 0.0 -1.5
4.5 3.0 1.5 0.0
Extend help
Note that the constructor is designed in such a way that it always returns matrices of type SkewSymMatrix{<:AbstractFloat}
when called with a matrix, even if this matrix is of type AbstractMatrix{<:Integer}
.
If the user wishes to allocate a matrix SkewSymMatrix{<:Integer}
the constructor SkewSymMatrix(::AbstractVector, n::Integer)
has to be called.
GeometricMachineLearning.StandardTransformerIntegrator
— TypeThe regular transformer used as an integrator (multi-step method).
The constructor is called with one argument:
sys_dim::Int
The following are keyword arguments:
transformer_dim::Int
: the default istransformer_dim = sys_dim
.n_blocks::Int
: The default is1
.n_heads::Int
: the number of heads in the multihead attentio layer (default isn_heads = sys_dim
)L::Int
the number of transformer blocks (default isL = 2
).upscaling_activation
: by default identityresnet_activation
: by default tanhadd_connection:Bool=true
: if the input should be added to the output.
GeometricMachineLearning.StiefelLayer
— TypeDefines a layer that performs simple multiplication with an element of the Stiefel manifold.
GeometricMachineLearning.StiefelLieAlgHorMatrix
— TypeStiefelLieAlgHorMatrix(A::SkewSymMatrix{T}, B::AbstractMatrix{T}, N::Integer, n::Integer) where T
Build an instance of StiefelLieAlgHorMatrix
based on a skew-symmetric matrix A
and an arbitrary matrix B
.
StiefelLieAlgHorMatrix
is the horizontal component of the Lie algebra of skew-symmetric matrices (with respect to the canonical metric). The projection here is: $\pi:S \to SE$ where
\[E = \begin{pmatrix} \mathbb{I}_{n} \\ \mathbb{O}_{(N-n)\times{}n} \end{pmatrix}.\]
The matrix $E$ is implemented under StiefelProjection
in GeometricMachineLearning
.
An element of StiefelLieAlgMatrix takes the form:
\[\begin{pmatrix} A & B^T \\ B & \mathbb{O} \end{pmatrix},\]
where $A$ is skew-symmetric (this is SkewSymMatrix
in GeometricMachineLearning
).
Also see GrassmannLieAlgHorMatrix
.
GeometricMachineLearning.StiefelLieAlgHorMatrix
— MethodStiefelLieAlgHorMatrix(D::AbstractMatrix, n::Integer)
Take a big matrix as input and build an instance of StiefelLieAlgHorMatrix
belonging to the StiefelManifold $St(n, N)$ where $N$ is the number of rows of D
.
If the constructor is called with a big $N\times{}N$ matrix, then the projection is performed the following way:
\[\begin{pmatrix} A & B_1 \\ B_2 & D \end{pmatrix} \mapsto \begin{pmatrix} \mathrm{skew}(A) & -B_2^T \\ B_2 & \mathbb{O} \end{pmatrix}.\]
The operation $\mathrm{skew}:\mathbb{R}^{n\times{}n}\to\mathcal{S}_\mathrm{skew}(n)$ is the skew-symmetrization operation. This is equivalent to calling of SkewSymMatrix
with an $n\times{}n$ matrix.
This can also be seen as the operation:
\[D \mapsto \Omega(E, DE) = \mathrm{skew}\left(2 \left(\mathbb{I} - \frac{1}{2} E E^T \right) DE E^T\right).\]
Also see GeometricMachineLearning.Ω
.
GeometricMachineLearning.StiefelManifold
— TypeAn implementation of the Stiefel manifold [7]. The Stiefel manifold is the collection of all matrices $Y\in\mathbb{R}^{N\times{}n}$ whose columns are orthonormal, i.e.
\[ St(n, N) = \{Y: Y^TY = \mathbb{I}_n \}.\]
The Stiefel manifold can be shown to have manifold structure (as the name suggests) and this is heavily used in GeometricMachineLearning
. It is further a compact space. More information can be found in the docstrings for rgrad(::StiefelManifold, ::AbstractMatrix)
and
metric(::StiefelManifold, ::AbstractMatrix, ::AbstractMatrix)`.
GeometricMachineLearning.StiefelProjection
— TypeOuter constructor for StiefelProjection
. This works with two integers as input and optionally the type.
GeometricMachineLearning.StiefelProjection
— TypeStiefelProjection(backend, T, N, n)
Make a matrix of the form $\begin{bmatrix} \mathbb{I} & \mathbb{O} \end{bmatrix}^T$ for a specific backend and data type.
An array that essentially does vcat(I(n), zeros(N-n, n))
with GPU support.
Extend help
Technically this should be a subtype of StiefelManifold
.
GeometricMachineLearning.StiefelProjection
— MethodStiefelProjection(A::AbstractMatrix)
Extract necessary information from A
and build an instance of StiefelProjection
.
Necessary information here referes to the backend, the data type and the size of the matrix.
GeometricMachineLearning.StiefelProjection
— MethodStiefelProjection(B::AbstractLieAlgHorMatrix)
Extract necessary information from B
and build an instance of StiefelProjection
.
Necessary information here referes to the backend, the data type and the size of the matrix.
The size is queried through B.N
and B.n
.
Examples
using GeometricMachineLearning
B₁ = rand(StiefelLieAlgHorMatrix, 5, 2)
B₂ = rand(GrassmannLieAlgHorMatrix, 5, 2)
E = [1. 0.; 0. 1.; 0. 0.; 0. 0.; 0. 0.]
StiefelProjection(B₁) ≈ StiefelProjection(B₂) ≈ E
# output
true
GeometricMachineLearning.SymmetricMatrix
— TypeSymmetricMatrix(S::AbstractVector, n::Integer)
Instantiate a symmetric matrix with information stored in vector S
.
A SymmetricMatrix
$A$ is a matrix $A^T = A$.
Internally the struct
saves a vector $S$ of size $n(n+1)\div2$. The conversion is done the following way:
\[[A]_{ij} = \begin{cases} S[( (i-1) i ) \div 2 + j] & \text{if $i\geq{}j$}\\ S[( (j-1) j ) \div 2 + i] & \text{else}. \end{cases}\]
So $S$ stores a string of vectors taken from $A$: $S = [\tilde{a}_1, \tilde{a}_2, \ldots, \tilde{a}_n]$ with $\tilde{a}_i = [[A]_{i1},[A]_{i2},\ldots,[A]_{ii}]$.
Also see SkewSymMatrix
, LowerTriangular
and UpperTriangular
.
Examples
using GeometricMachineLearning
S = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
SymmetricMatrix(S, 4)
# output
4×4 SymmetricMatrix{Int64, Vector{Int64}}:
1 2 4 7
2 3 5 8
4 5 6 9
7 8 9 10
GeometricMachineLearning.SymmetricMatrix
— MethodSymmetricMatrix(A::AbstractMatrix)
Perform 0.5 * (A + A')
and store the matrix in an efficient way (as a vector with $n(n+1)/2$ entries).
If the constructor is called with a matrix as input it returns a symmetric matrix via the projection:
\[A \mapsto \frac{1}{2}(A + A^T).\]
Examples
using GeometricMachineLearning
M = [1 2 3 4; 5 6 7 8; 9 10 11 12; 13 14 15 16]
SymmetricMatrix(M)
# output
4×4 SymmetricMatrix{Float64, Vector{Float64}}:
1.0 3.5 6.0 8.5
3.5 6.0 8.5 11.0
6.0 8.5 11.0 13.5
8.5 11.0 13.5 16.0
Extend help
Note that the constructor is designed in such a way that it always returns matrices of type SymmetricMatrix{<:AbstractFloat}
when called with a matrix, even if this matrix is of type AbstractMatrix{<:Integer}
.
If the user wishes to allocate a matrix SymmetricMatrix{<:Integer}
the constructor SymmetricMatrix(::AbstractVector, n::Integer)
has to be called.
GeometricMachineLearning.SympNet
— TypeThe SympNet
type encompasses GSympNet
s and LASympNet
s. SympNets are universal approximators of symplectic flows, i.e. maps $\varphi:\mathbb{R}^{2n}\to\mathbb{R}^{2n}$ for which $(\nabla\varphi)^T\mathbb{J}\nabla\varphi = \mathbb{J}$ holds.
GeometricMachineLearning.SympNetLayer
— TypeImplements the various layers from the SympNet paper [24]. This is a super type of GradientLayer
, ActivationLayer
and LinearLayer
.
For the linear layer, the activation and the bias are left out, and for the activation layer $K$ and $b$ are left out!
GeometricMachineLearning.SympNetLayer
— MethodThis is called when a SympnetLayer is applied to a NamedTuple
. It calls apply_layer_to_nt_and_return_array
.
GeometricMachineLearning.SymplecticAutoencoder
— TypeSymplecticAutoencoder(full_dim, reduced_dim)
Make an instance of SymplecticAutoencoder
for dimensions full_dim
and reduced_dim
(those have to be provided as Integer
s).
The architecture
The symplectic autoencoder architecture was introduced in [40]. Like any other autoencoder it consists of an encoder $\Psi^e:\mathbb{R}^{2N}\to\mathbb{R}^{2n}$ and a decoder $\Psi^d:\mathbb{R}^{2n}\to\mathbb{R}^{2N}$ with $n\ll{}N$. These satisfy the following properties:
\[\nabla_z\Psi^e\mathbb{J}_{2N}(\nabla_z\Psi^e\mathbb{J}_{2N})^T = \mathbb{J}_{2n} \text{ and } (\nabla_\xi\Psi^d)^T\mathbb{J}_{2N}\nabla_\xi\Psi^d = \mathbb{J}_{2n}.\]
Because the decoder has this particular property, the reduced system can be described by the Hamiltonian $H\circ\Psi^d$:
\[\mathbb{J}_{2n}\nabla_\xi(H\circ\Psi^d) = \mathbb{J}_{2n}(\nabla_\xi\Psi^d)^T\nabla_{\Psi^d(\xi)}H = \mathbb{J}_{2n}(\nabla_\xi\Psi^d)^T\mathbb{J}_{2N}^T\mathbb{J}_{2N}\nabla_{\Psi^d(\xi)}H = (\nabla_\xi\Psi^d)^+X_H(\Psi^d(\xi)),\]
where $(\nabla_\xi\Psi^d)^+$ is the pseudoinverse of $\nabla_\xi\Psi^d$ (for more details see the docs on the AutoEncoder type).
Arguments
You can provide the following keyword arguments:
n_encoder_layers::Integer = 4
: The number of layers in one encoder block.n_encoder_blocks::Integer = 2
: The number of encoder blocks.n_decoder_layers::Integer = 1
: The number of layers in one decoder block.n_decoder_blocks::Integer = 3
: The number of decoder blocks.sympnet_upscale::Integer = 5
: The upscaling dimension of the GSympNet. SeeGradientLayerQ
andGradientLayerP
.activation = tanh
: The activation in the gradient layers.encoder_init_q::Bool = true
: Specifies if the first layer in each encoder block should be of $q$ type.decoder_init_q::Bool = true
: Specifies if the first layer in each decoder block should be of $p$ type.
GeometricMachineLearning.SymplecticDecoder
— TypeAbstract SymplecticDecoder
type.
See Decoder
for the super type and NonLinearSymplecticDecoder
for a derived struct
.
GeometricMachineLearning.SymplecticEncoder
— TypeAbstract SymplecticEncoder
type.
See Encoder
for the super type and NonLinearSymplecticEncoder
for a derived struct
.
GeometricMachineLearning.SymplecticPotential
— TypeSymplecticPotential(n)
Returns a symplectic matrix of size 2n x 2n
\[\begin{pmatrix} \mathbb{O} & \mathbb{I} \\ \mathbb{O} & -\mathbb{I} \\ \end{pmatrix}\]
GeometricMachineLearning.TrainingData
— TypeTrainingData stores:
- problem
- shape
- get
- symbols
- dim
- noisemaker
GeometricMachineLearning.TransformerIntegrator
— TypeEncompasses various transformer architectures, such as the VolumePreservingTransformer
and the LinearSymplecticTransformer
.
GeometricMachineLearning.TransformerLoss
— TypeTransformerLoss(seq_length, prediction_window)
Make an instance of the transformer loss.
This is the loss for a transformer network (especially a transformer integrator).
Parameters
The prediction_window
specifies how many time steps are predicted into the future. It defaults to the value specified for seq_length
.
GeometricMachineLearning.UpperTriangular
— TypeLowerTriangular(S::AbstractVector, n::Int)
Build a lower-triangular matrix from a vector.
A lower-triangular matrix is an $n\times{}n$ matrix that has ones on the diagonal and zeros on the upper triangular.
The data are stored in a vector $S$ similarly to other matrices. See LowerTriangular
, SkewSymMatrix
and SymmetricMatrix
.
The struct two fields: S
and n
. The first stores all the entries of the matrix in a sparse fashion (in a vector) and the second is the dimension $n$ for $A\in\mathbb{R}^{n\times{}n}$.
Examples
using GeometricMachineLearning
S = [1, 2, 3, 4, 5, 6]
UpperTriangular(S, 4)
# output
4×4 UpperTriangular{Int64, Vector{Int64}}:
0 1 2 4
0 0 3 5
0 0 0 6
0 0 0 0
GeometricMachineLearning.UpperTriangular
— MethodUpperTriangular(A::AbstractMatrix)
Build a lower-triangular matrix from a matrix.
This is done by taking the lower left of that matrix.
Examples
using GeometricMachineLearning
M = [1 2 3 4; 5 6 7 8; 9 10 11 12; 13 14 15 16]
UpperTriangular(M)
# output
4×4 UpperTriangular{Int64, Vector{Int64}}:
0 2 3 4
0 0 7 8
0 0 0 12
0 0 0 0
GeometricMachineLearning.VolumePreservingAttention
— TypeVolume-preserving attention (single head attention)
Drawbacks:
- the super fast activation is only implemented for sequence lengths of 2, 3, 4 and 5.
- other sequence lengths only work on CPU for now (lu decomposition has to be implemented to work for tensors in parallel).
Constructor
The constructor is called with:
dim::Int
: The system dimensionseq_length::Int
: The sequence length to be considered. The default is zero, i.e. arbitrary sequence lengths; this works for all sequence lengths but doesn't apply the super-fast activation.skew_sym::Bool
(keyword argument): specifies if we the weight matrix is skew symmetric or arbitrary (default is false).
Functor
Applying a layer of type VolumePreservingAttention
does the following:
- First we perform the operation $X \mapsto X^T A X =: C$, where $X\in\mathbb{R}^{N\times\mathtt{seq\_length}}$ is a vector containing time series data and $A$ is the skew symmetric matrix associated with the layer.
- In a second step we compute the Cayley transform of $C$; $\Lambda = \mathrm{Cayley}(C)$.
- The output of the layer is then $X\Lambda$.
GeometricMachineLearning.VolumePreservingFeedForward
— TypeRealizes a volume-preserving neural network as a combination of VolumePreservingLowerLayer
and VolumePreservingUpperLayer
.
Constructor
The constructor is called with the following arguments:
sys_dim::Int
: The system dimension.n_blocks::Int
: The number of blocks in the neural network (containing linear layers and nonlinear layers). Default is1
.n_linear::Int
: The number of linearVolumePreservingLowerLayer
s andVolumePreservingUpperLayer
s in one block. Default is1
.activation
: The activation function for the nonlinear layers in a block.init_upper::Bool=false
(keyword argument): Specifies if the first layer is lower or upper.
GeometricMachineLearning.VolumePreservingFeedForwardLayer
— TypeSuper-type of VolumePreservingLowerLayer
and VolumePreservingUpperLayer
. The layers do the following:
\[x \mapsto \begin{cases} \sigma(Lx + b) & \text{where $L$ is }\mathtt{LowerTriangular} \\ \sigma(Ux + b) & \text{where $U$ is }\mathtt{UpperTriangular}. \end{cases}\]
The functor can be applied to a vecotr, a matrix or a tensor.
Constructor
The constructors are called with:
sys_dim::Int
: the system dimension.activation=tanh
: the activation function.include_bias::Bool=true
(keyword argument): specifies whether a bias should be used.
GeometricMachineLearning.VolumePreservingLowerLayer
— TypeSee the documentation for VolumePreservingFeedForwardLayer
.
GeometricMachineLearning.VolumePreservingTransformer
— TypeThe volume-preserving transformer with the Cayley activation function and built-in upscaling.
Constructor
The arguments for the constructor are:
sys_dim::Int
seq_length::Int
: The sequence length of the data fed into the transformer.
The following are keyword argumetns:
n_blocks::Int=1
: The number of blocks in one transformer unit (containing linear layers and nonlinear layers). Default is1
.n_linear::Int=1
: The number of linearVolumePreservingLowerLayer
s andVolumePreservingUpperLayer
s in one block. Default is1
.L::Int=1
: The number of transformer units.activation=tanh
: The activation function.init_upper::Bool=false
: Specifies if the network first acts on the $q$ component.skew_sym::Bool=false
: specifies if we the weight matrix is skew symmetric or arbitrary.
GeometricMachineLearning.VolumePreservingUpperLayer
— TypeSee the documentation for VolumePreservingFeedForwardLayer
.
AbstractNeuralNetworks.update!
— Methodupdate!(o, cache, dx::AbstractArray)
Update the cache
based on the gradient information dx
, compute the final velocity and store it in dx
.
The optimizer o
is needed because some updating schemes (such as AdamOptimizer
) also need information on the current time step.
AbstractNeuralNetworks.update!
— Methodupdate!(o, cache, B)
First update the cache
and then update the array B
based on the optimizer o
.
Note that $B\in\mathfrak{g}^\mathrm{hor}$ in general.
AbstractNeuralNetworks.update!
— Methodupdate!(o::Optimizer{<:BFGSOptimizer}, C, B)
Peform an update with the BFGS optimizer.
C
is the cache, B
contains the gradient information (the output of global_rep
in general).
First we compute the final velocity with
vecS = -o.method.η * C.H * vec(B)
and then we update H
C.H .= (𝕀 - ρ * SY) * C.H * (𝕀 - ρ * SY') + ρ * vecS * vecS'
where SY
is vecS * Y'
and 𝕀
is the idendity.
Implementation
For stability we use δ
for computing ρ
:
ρ = 1. / (vecS' * Y + o.method.δ)
This is similar to the AdamOptimizer
Extend Help
If we have weights on a Manifold
than the updates are slightly more difficult. In this case the vec
operation has to be generalized to the corresponding global tangent space.
Base.:*
— Method*(λY, Y)
Apply the element λY
onto Y
.
Here λY
is an element of a Lie group and Y
is an element of a homogeneous space.
Base.iterate
— MethodThis function computes a trajectory for a Transformer that has already been trained for valuation purposes.
It takes as input:
nn
: aNeuralNetwork
(that has been trained).ics
: initial conditions (a matrix in $\mathbb{R}^{2n\times\mathtt{seq\_length}}$ orNamedTuple
of two matrices in $\mathbb{R}^{n\times\mathtt{seq\_length}}$)n_points::Int=100
(keyword argument): The number of steps for which we run the prediction.prediction_window::Int=size(ics.q, 2)
: The prediction window (i.e. the number of steps we predict into the future) is equal to the sequence length (i.e. the number of input time steps) by default.
Base.iterate
— MethodThis function computes a trajectory for a SympNet that has already been trained for valuation purposes.
It takes as input:
nn
: aNeuralNetwork
(that has been trained).ics
: initial conditions (aNamedTuple
of two vectors)
Base.rand
— Methodrand(backend::KernelAbstractions.Backend, manifold_type::Type{MT}, N::Integer, n::Integer) where MT <: Manifold)
Draw random elements for a specific device.
Examples
using GeometricMachineLearning
using GeometricMachineLearning: _round # hide
import Random
Random.seed!(123)
N, n = 5, 3
Y = rand(CPU(), StiefelManifold{Float32}, N, n)
_round(Y; digits = 5) # hide
# output
5×3 StiefelManifold{Float32, Matrix{Float32}}:
-0.27575 0.32991 0.77275
-0.62485 -0.33224 -0.0686
-0.69333 0.36724 -0.18988
-0.09295 -0.73145 0.46064
0.2102 0.33301 0.38717
Random elements of the manifold can also be allocated on GPU, via e.g. ...
rand(CUDABackend(), StiefelManifold{Float32}, N, n)
... for drawing elements on a CUDA
device.
Base.rand
— Methodrand(manifold_type::Type{MT}, N::Integer, n::Integer) where MT <: Manifold
Draw random elements from the Stiefel and the Grassmann manifold.
Because both of these manifolds are compact spaces we can sample them uniformly [8].
Examples
When we call ...
using GeometricMachineLearning
using GeometricMachineLearning: _round # hide
import Random
Random.seed!(123)
N, n = 5, 3
Y = rand(StiefelManifold{Float32}, N, n)
_round(Y; digits = 5) # hide
# output
5×3 StiefelManifold{Float32, Matrix{Float32}}:
-0.27575 0.32991 0.77275
-0.62485 -0.33224 -0.0686
-0.69333 0.36724 -0.18988
-0.09295 -0.73145 0.46064
0.2102 0.33301 0.38717
... the sampling is done by first allocating a random matrix of size $N\times{}n$ via Y = randn(Float32, N, n)
. We then perform a QR decomposition Q, R = qr(Y)
with the qr
function from the LinearAlgebra
package (this is using Householder reflections internally). The final output are then the first n
columns of the Q
matrix.
Base.vec
— MethodIf vec
is applied onto Triangular
, then the output is the associated vector.
Base.vec
— MethodIf vec
is applied onto SkewSymMatrix
, then the output is the associated vector.
ChainRulesCore.rrule
— MethodThis implements the custom pullback for tensortransposemat_mul
GeometricMachineLearning.Gradient
— FunctionThis is an old constructor and will be depricated. For change_q=true
it is equivalent to GradientLayerQ
; for change_q=false
it is equivalent to GradientLayerP
.
If full_grad=false
then ActivationLayer
is called
GeometricMachineLearning.Transformer
— MethodThe architecture for a "transformer encoder" is essentially taken from arXiv:2010.11929, but with the difference that no layer normalization is employed. This is because we still need to find a generalization of layer normalization to manifolds.
The transformer is called with the following inputs:
dim
: the dimension of the transformern_heads
: the number of headsL
: the number of transformer blocks
In addition we have the following optional arguments:
activation
: the activation function used for theResNet
(tanh
by default)Stiefel::Bool
: if the matrices $P^V$, $P^Q$ and $P^K$ should live on a manifold (false
by default)retraction
: which retraction should be used (Geodesic()
by default)add_connection::Bool
: if the input should by added to the ouput after theMultiHeadAttention
layer is used (true
by default)use_bias::Bool
: If theResNet
should use a bias (true
by default)
GeometricMachineLearning.accuracy
— Methodaccuracy(nn, dl)
Compute the accuracy of a neural network classifier.
GeometricMachineLearning.accuracy
— Methodaccuracy(model, ps, dl)
Compute the accuracy of a neural network classifier.
GeometricMachineLearning.apply_layer_to_nt_and_return_array
— MethodThis function is used in the wrappers where the input to the SympNet layers is not a NamedTuple
(as it should be) but an AbstractArray
.
It converts the Array to a NamedTuple
(via assign_q_and_p
), then calls the SympNet routine(s) and converts back to an AbstractArray
(with vcat
).
GeometricMachineLearning.apply_section!
— Methodapply_section!(Y::AT, λY::GlobalSection{T, AT}, Y₂::AT) where {T, AT<:StiefelManifold{T}}
Apply λY
to Y₂
and store the result in Y
.
The inplace version of apply_section
.
GeometricMachineLearning.apply_section
— Methodapply_section(λY::GlobalSection{T, AT}, Y₂::AT) where {T, AT <: StiefelManifold{T}}
Apply λY
to Y₂
.
Mathematically this is the group action of the element $\lambda{}Y\in{}G$ on the element $Y_2$ of the homogeneous space $\mathcal{M}$.
Internally it calls the inplace version apply_section!
.
GeometricMachineLearning.assign_batch_kernel!
— MethodTakes as input a batch tensor (to which the data are assigned), the whole data tensor and two vectors params and time_steps that include the specific parameters and time steps we want to assign.
Note that this assigns sequential data! For e.g. being processed by a transformer.
GeometricMachineLearning.assign_output_estimate
— MethodThe function assign_output_estimate
is closely related to the transformer. It takes the last prediction_window
columns of the output and uses them for the final prediction. i.e.
\[\mathbb{R}^{N\times\mathtt{pw}}\to\mathbb{R}^{N\times\mathtt{pw}}, \begin{bmatrix} z^{(1)}_1 & \cdots & z^{(T)}_1 \\ \cdots & \cdots & \cdots \\ z^{(1)}_n & \cdots & z^{(T})_n \end{bmatrix} \mapsto \begin{bmatrix} z^{(T - \mathtt{pw})}_1 & \cdots & z^{(T)}_1 \\ \cdots & \cdots & \cdots \\ z^{(T - \mathtt{pw})}_n & \cdots & z^{(T})_n\end{bmatrix} \]
GeometricMachineLearning.assign_output_kernel!
— MethodThis should be used together with assign_batch_kernel!
. It assigns the corresponding output (i.e. target).
GeometricMachineLearning.assign_q_and_p
— MethodAllocates two new arrays q
and p
whose first dimension is half of that of the input x
. This should also be supplied through the second argument N
.
The output is a Tuple
containing q
and p
.
GeometricMachineLearning.augment_zeros_kernel!
— MethodUsed for differentiating assignoutputestimate (this appears in the loss).
GeometricMachineLearning.build_v_reduced
— MethodBuilds the reduced vector field based on the full vector field for a Hamiltonian system. We derive the reduced vector field via the reduced Hamiltonian: $\tilde{H} := H\circ\Psi^\mathrm{dec}$. We then get
\[\mathbb{J}_{2n}\nabla_\xi\tilde{H} = \mathbb{J}_{2n}(\nabla\Psi^\mathrm{dec})^T\mathbb{J}_{2N}^T\mathbb{J}_{2N}\nabla_z{}H = \mathbb{J}_{2n}(\nabla\Psi^\mathrm{dec})^T\mathbb{J}_{2N}^T \begin{pmatrix} v(z) \\ f(z) \end{pmatrix} = \begin{pmatrix} - (\nabla_p\Psi_q)^Tf(z) + (\nabla_p\Psi_p)^Tv(z) \\ (\nabla_q\Psi_q)^Tf(z) - (\nabla_q\Psi_p)^Tv(z) \end{pmatrix}.\]
GeometricMachineLearning.cayley
— Methodcayley(B::GrassmannLieAlgHorMatrix)
Compute the Cayley retraction of B
and multiply it with E
(the distinct element of the Stiefel manifold).
GeometricMachineLearning.cayley
— Methodcayley(B::StiefelLieAlgHorMatrix)
Compute the Cayley retraction of B
and multiply it with E
(the distinct element of the Stiefel manifold).
GeometricMachineLearning.cayley
— Methodcayley(Y::Manifold, Δ)
Take as input an element of a manifold Y
and a tangent vector in Δ
in the corresponding tangent space and compute the Cayley retraction.
In different notation: take as input an element $x$ of $\mathcal{M}$ and an element of $T_x\mathcal{M}$ and return $\mathrm{Cayley}(v_x).$ For example:
Y = rand(StiefelManifold{Float64}, N, n)
Δ = rgrad(Y, rand(N, n))
cayley(Y, Δ)
See the docstring for rgrad
for details on this function.
GeometricMachineLearning.compute_iterations
— MethodThis function gives iterations from the full dimension to the reduced dimension (i.e. the intermediate steps). The iterations are given in ascending order.
GeometricMachineLearning.compute_iterations_for_symplectic_system
— MethodThis function gives iterations from the full dimension to the reduced dimension (i.e. the intermediate steps). The iterations are given in ascending order. Only even steps are allowed here.
GeometricMachineLearning.compute_output_of_mha
— MethodApplies MHA to an abstract matrix. This is the same independent of whether the input is added to the output or not.
GeometricMachineLearning.convert_input_and_batch_indices_to_array
— MethodTakes the output of the batch functor and uses it to create the corresponding array (NamedTuples).
GeometricMachineLearning.convert_input_and_batch_indices_to_array
— MethodTakes the output of the batch functor and uses it to create the corresponding array.
GeometricMachineLearning.crop_array_for_transformer_loss
— MethodThis crops the output array of the neural network so that it conforms with the output it should be compared to. This is needed for the transformer loss.
GeometricMachineLearning.custom_mat_mul
— MethodMultiplies a matrix with a vector, a matrix or a tensor.
GeometricMachineLearning.decoder
— Methoddecoder(nn::NeuralNetwork{<:AutoEncoder})
Obtain the decoder from a AutoEncoder
neural network.
The input is a neural network and the output is as well.
GeometricMachineLearning.decoder_layers_from_iteration
— MethodTakes as input the autoencoder architecture and a vector of integers specifying the layer dimensions in the decoder. Has to return a tuple of AbstractExplicitLayer
s.
GeometricMachineLearning.draw_batch!
— MethodThis assigns the batch if the data are in form of a matrix.
GeometricMachineLearning.encoder
— Methodencoder(nn::NeuralNetwork{<:AutoEncoder})
Obtain the encoder from a AutoEncoder
neural network.
The input is a neural network and the output is as well.
GeometricMachineLearning.encoder_layers_from_iteration
— MethodTakes as input the autoencoder architecture and a vector of integers specifying the layer dimensions in the encoder. Has to return a tuple of AbstractExplicitLayer
s.
GeometricMachineLearning.geodesic
— Methodgeodesic(B::GrassmannLieAlgHorMatrix)
Compute the geodesic of an element in GrassmannLieAlgHorMatrix
.
GeometricMachineLearning.geodesic
— Methodgeodesic(B::StiefelLieAlgHorMatrix)
Compute the geodesic of an element in StiefelLieAlgHorMatrix
.
Implementation
This is using a computationally efficient version of the matrix exponential. See GeometricMachineLearning.𝔄
.
GeometricMachineLearning.geodesic
— Methodgeodesic(Y::Manifold, Δ)
Take as input an element of a manifold Y
and a tangent vector in Δ
in the corresponding tangent space and compute the geodesic (exponential map).
In different notation: take as input an element $x$ of $\mathcal{M}$ and an element of $T_x\mathcal{M}$ and return $\mathtt{geodesic}(x, v_x) = \exp(v_x).$ For example:
Y = rand(StiefelManifold{Float64}, N, n)
Δ = rgrad(Y, rand(N, n))
geodesic(Y, Δ)
See the docstring for rgrad
for details on this function.
GeometricMachineLearning.global_rep
— Methodglobal_rep(λY::GlobalSection{T, AT}, Δ::AbstractMatrix{T}) where {T, AT<:GrassmannManifold{T}}
Express Δ
(an the tangent space of Y
) as an instance of GrassmannLieAlgHorMatrix
.
The method global_rep
for GrassmannManifold
is similar to that for StiefelManifold
.
Examples
using GeometricMachineLearning
using GeometricMachineLearning: _round
import Random
Random.seed!(123)
Y = rand(GrassmannManifold, 6, 3)
Δ = rgrad(Y, randn(6, 3))
λY = GlobalSection(Y)
_round(global_rep(λY, Δ); digits = 3)
# output
6×6 GrassmannLieAlgHorMatrix{Float64, Matrix{Float64}}:
0.0 0.0 0.0 0.981 -2.058 0.4
0.0 0.0 0.0 -0.424 0.733 -0.919
0.0 0.0 0.0 -1.815 1.409 1.085
-0.981 0.424 1.815 0.0 0.0 0.0
2.058 -0.733 -1.409 0.0 0.0 0.0
-0.4 0.919 -1.085 0.0 0.0 0.0
GeometricMachineLearning.global_rep
— Methodglobal_rep(λY::GlobalSection{T, AT}, Δ::AbstractMatrix{T}) where {T, AT<:StiefelManifold{T}}
Express Δ
(an the tangent space of Y
) as an instance of StiefelLieAlgHorMatrix
.
This maps an element from $T_Y\mathcal{M}$ to an element of $\mathfrak{g}^\mathrm{hor}$.
These two spaces are isomorphic where the isomorphism where the isomorphism is established through $\lambda(Y)\in{}G$ via:
\[T_Y\mathcal{M} \to \mathfrak{g}^{\mathrm{hor}}, \Delta \mapsto \lambda(Y)^{-1}\Omega(Y, \Delta)\lambda(Y).\]
Also see GeometricMachineLearning.Ω
.
Examples
using GeometricMachineLearning
using GeometricMachineLearning: _round
import Random
Random.seed!(123)
Y = rand(StiefelManifold, 6, 3)
Δ = rgrad(Y, randn(6, 3))
λY = GlobalSection(Y)
_round(global_rep(λY, Δ); digits = 3)
# output
6×6 StiefelLieAlgHorMatrix{Float64, SkewSymMatrix{Float64, Vector{Float64}}, Matrix{Float64}}:
0.0 0.679 1.925 0.981 -2.058 0.4
-0.679 0.0 0.298 -0.424 0.733 -0.919
-1.925 -0.298 0.0 -1.815 1.409 1.085
-0.981 0.424 1.815 0.0 0.0 0.0
2.058 -0.733 -1.409 0.0 0.0 0.0
-0.4 0.919 -1.085 0.0 0.0 0.0
Implementation
The function global_rep
does in fact not perform the entire map $\lambda(Y)^{-1}\Omega(Y, \Delta)\lambda(Y)$ but only
\[\Delta \mapsto \mathrm{skew}(Y^T\Delta),\]
to get the small skew-symmetric matrix and
\[\Delta \mapsto (\lambda(Y)_{[1:N, n:N]}^T \Delta)_{[1:(N-n), 1:n]},\]
for the arbitrary matrix.
GeometricMachineLearning.global_section
— Methodglobal_section(Y::GrassmannManifold)
Compute a matrix of size $N\times(N-n)$ whose columns are orthogonal to the columns in Y
.
The method global_section
for the Grassmann manifold is equivalent to that for the StiefelManifold
(we represent the Grassmann manifold as an embedding in the Stiefel manifold).
See the documentation for global_section(Y::StiefelManifold{T}) where T
.
GeometricMachineLearning.global_section
— Methodglobal_section(Y::StiefelManifold)
Compute a matrix of size $N\times(N-n)$ whose columns are orthogonal to the columns in Y
.
This matrix is also called $Y_\perp$ [6, 10, 11].
Examples
using GeometricMachineLearning
using GeometricMachineLearning: global_section
import Random
Random.seed!(123)
Y = StiefelManifold([1. 0.; 0. 1.; 0. 0.; 0. 0.])
round.(Matrix(global_section(Y)); digits = 3)
# output
4×2 Matrix{Float64}:
0.0 -0.0
0.0 0.0
0.936 -0.353
0.353 0.936
Further note that we convert the QRCompactWYQ
object to a Matrix
before we display it.
Implementation
The implementation is done with a QR decomposition (LinearAlgebra.qr!
). Internally we do:
A = randn(N, N - n) # or the gpu equivalent
A = A - Y.A * (Y.A' * A)
qr!(A).Q
GeometricMachineLearning.init_optimizer_cache
— Methodinit_optimizer_cache(method, x)
Initialize the cache corresponding to the weights x
for a specific method.
Implementation
Wrapper for the functions setup_adam_cache
, setup_momentum_cache
, setup_gradient_cache
, setup_bfgs_cache
. These appear outside of optimizer_caches.jl
because the OptimizerMethods
first have to be defined.
GeometricMachineLearning.init_optimizer_cache
— Methodinit_optimizer_cache(method, x)
Initialize the optimizer cache based on input x
for the given method
.
GeometricMachineLearning.initialize_hessian_inverse
— Methodinitialize_hessian_inverse(B)
Initialize the inverse of the Hessian for various arrays.
Implementation
This requires an implementation of a vectorization operation vec
. This is important for custom arrays.
GeometricMachineLearning.map_index_for_symplectic_potential
— MethodThis assigns the right index for the symplectic potential. To be used with assign_ones_for_symplectic_potential_kernel!
.
GeometricMachineLearning.mat_tensor_mul!
— Methodmat_tensor_mul!(C, A, B)
Multiply the matrix A
onto the tensor B
from the left and store the result in C
.
Also checks the bounds of the input arrays.
The function mat_tensor_mul
calls mat_tensor_mul!
internally.
GeometricMachineLearning.mat_tensor_mul!
— Methodmat_tensor_mul!(C::AbstractArray{T, 3}, A::LowerTriangular{T}, B::AbstractArray{T, 3}) where T
Multiply the lower-triangular matrix A
onto the tensor B
from the left and store the result in C
.
Also checks the bounds of the input arrays.
This performs an efficient multiplication based on the special structure of the lower-triangular matrix A
.
GeometricMachineLearning.mat_tensor_mul!
— Methodmat_tensor_mul!(C::AbstractArray{T, 3}, A::SkewSymMatrix{T}, B::AbstractArray{T, 3}) where T
Multiply skew-symmetric the matrix A
onto the tensor B
from the left and store the result in C
.
Also checks the bounds of the input arrays.
This performs an efficient multiplication based on the special structure of the skew-symmetric matrix A
.
GeometricMachineLearning.mat_tensor_mul!
— Methodmat_tensor_mul!(C::AbstractArray{T, 3}, A::SymmetricMatrix{T}, B::AbstractArray{T, 3}) where T
Multiply the symmetric matrix A
onto the tensor B
from the left and store the result in C
.
Also checks the bounds of the input arrays.
This performs an efficient multiplication based on the special structure of the symmetric matrix A
.
GeometricMachineLearning.mat_tensor_mul!
— Methodmat_tensor_mul!(C::AbstractArray{T, 3}, A::UpperTriangular{T}, B::AbstractArray{T, 3}) where T
Multiply the upper-triangular matrix A
onto the tensor B
from the left and store the result in C
.
Also checks the bounds of the input arrays.
This performs an efficient multiplication based on the special structure of the upper-triangular matrix A
.
GeometricMachineLearning.mat_tensor_mul
— MethodExtend mat_tensor_mul
to a multiplication by the adjoint of an element of StiefelManifold
.
GeometricMachineLearning.mat_tensor_mul
— Methodmat_tensor_mul(A::AbstractMatrix{T}, B::AbstractArray{T, 3}) where T
Multipliy the matrix A
onto the tensor B
from the left.
Internally this calls the inplace version mat_tensor_mul!
.
Examples
using GeometricMachineLearning: mat_tensor_mul
B = [1 1 1; 1 1 1; 1 1 1;;; 2 2 2; 2 2 2; 2 2 2]
A = [3 0 0; 0 2 0; 0 0 1]
mat_tensor_mul(A, B)
# output
3×3×2 Array{Int64, 3}:
[:, :, 1] =
3 3 3
2 2 2
1 1 1
[:, :, 2] =
6 6 6
4 4 4
2 2 2
GeometricMachineLearning.mat_tensor_mul
— MethodExtend mat_tensor_mul
to a multiplication by an element of StiefelManifold
.
GeometricMachineLearning.metric
— Methodmetric(Y::GrassmannManifold, Δ₁::AbstractMatrix, Δ₂::AbstractMatrix)
Compute the metric for vectors Δ₁
and Δ₂
at Y
.
The representation of the Grassmann manifold is realized as a quotient space of the Stiefel manifold.
The metric for the Grassmann manifold is:
\[g^{Gr}_Y(\Delta_1, \Delta_2) = g^{St}_Y(\Delta_1, \Delta_2) = \mathrm{Tr}(\Delta_1^T (\mathbb{I} - Y Y^T) \Delta_2) = \mathrm{Tr}(\Delta_1^T \Delta_2).\]
GeometricMachineLearning.metric
— MethodImplements the canonical Riemannian metric for the Stiefel manifold:
\[g_Y: (\Delta_1, \Delta_2) \mapsto \mathrm{tr}(\Delta_1^T(\mathbb{I} - \frac{1}{2}YY^T)\Delta_2).\]
It is called with:
Y::StiefelManifold
Δ₁::AbstractMatrix
Δ₂::AbstractMatrix
GeometricMachineLearning.number_of_batches
— MethodGives the number of batches. Inputs are of type DataLoader
and Batch
.
Here the big distinction is between data that are time-series like and data that are autoencoder like.
GeometricMachineLearning.onehotbatch
— MethodOne-hot-batch encoding of a vector of integers: $input\in\{0,1,\ldots,9\}^\ell$. The output is a tensor of shape $10\times1\times\ell$.
\[0 \mapsto \begin{bmatrix} 1 & 0 & \ldots & 0 \end{bmatrix}.\]
In more abstract terms: $i \mapsto e_i$.
GeometricMachineLearning.optimization_step!
— Methodoptimization_step!(o::Optimizer, λY::NamedTuple, ps::NamedTuple, dx::NamedTuple)
Optimize a neural network consisting of a single AbstractExplicitLayer
.
GeometricMachineLearning.optimization_step!
— Methodoptimization_step!(o::Optimizer, λY::Chain, ps::Tuple, dx::Tuple)
Optimize a neural network built with Chain
.
GeometricMachineLearning.optimization_step!
— Methodoptimization_step!(o, λY, ps, cache, dx)
Update the weights ps
of a layer
based on an Optimizer
, a cache
and first-order derivatives dx
.
The derivatives dx
here are usually obtained via an AD routine by differentiating a loss function, i.e. dx
is $\nabla_xL$.
It is calling the function update!
internally which has to be implemented for every OptimizerMethod
.
GeometricMachineLearning.optimize_for_one_epoch!
— MethodOptimize for an entire epoch. For this you have to supply:
- an instance of the optimizer.
- the neural network model
- the parameters of the model
- the data (in form of
DataLoader
) - in instance of
Batch
that containsbatch_size
(and optionallyseq_length
)
With the optional argument:
- the loss, which takes the
model
, the parametersps
and an instance ofDataLoader
as input.
The output of optimize_for_one_epoch!
is the average loss over all batches of the epoch:
\[output = \frac{1}{\mathtt{steps\_per\_epoch}}\sum_{t=1}^\mathtt{steps\_per\_epoch}loss(\theta^{(t-1)}).\]
This is done because any reverse differentiation routine always has two outputs: a pullback and the value of the function it is differentiating. In the case of zygote: loss_value, pullback = Zygote.pullback(ps -> loss(ps), ps)
(if the loss only depends on the parameters).
GeometricMachineLearning.patch_index
— MethodBased on coordinates i,j this returns the batch index (for MNIST data set for now).
GeometricMachineLearning.rgrad
— Methodrgrad(Y::GrassmannManifold, e_grad::AbstractMatrix)
Compute the Riemannian gradient at $Y\in{}Gr(n, N)$.
These gradient have the property that they are orthogonal to the space spanned by $Y$.
The precise form of the mapping is:
\[\mathtt{rgrad}(Y, \nabla{}L) \mapsto \nabla{}L - YY^T\nabla{}L\]
Note the property $Y^T\mathrm{rgrad}(Y, \nabla{}L) = \mathbb{O}.$
Also see rgrad(::StiefelManifold, ::AbstractMatrix)
.
Examples
using GeometricMachineLearning
Y = GrassmannManifold([1 0 ; 0 1 ; 0 0; 0 0])
Δ = [1 2; 3 4; 5 6; 7 8]
rgrad(Y, Δ)
# output
4×2 Matrix{Int64}:
0 0
0 0
5 6
7 8
GeometricMachineLearning.rgrad
— Methodrgrad(Y::StiefelManifold, e_grad::AbstractMatrix)
Compute the Riemannian gradient for the Stiefel manifold at $Y\in{}St(N,n)$ based on $\nabla{}L\in\mathbb{R}^{N\times{}n}$ (the Euclidean gradient).
The function computes the Riemannian gradient with respect to the canonical metric
.
The precise form of the mapping is:
\[\mathtt{rgrad}(Y, \nabla{}L) \mapsto \nabla{}L - Y(\nabla{}L)^TY\]
Note the property $Y^T\mathrm{rgrad}(Y, \nabla{}L)\in\mathcal{S}_\mathrm{skew}(n).$
Examples
using GeometricMachineLearning
Y = StiefelManifold([1 0 ; 0 1 ; 0 0; 0 0])
Δ = [1 2; 3 4; 5 6; 7 8]
rgrad(Y, Δ)
# output
4×2 Matrix{Int64}:
0 -1
1 0
5 6
7 8
GeometricMachineLearning.solve!
— MethodPSDArch does not require neural network training since it is a strictly linear operation that can be solved with singular value decomposition (SVD).
GeometricMachineLearning.split_and_flatten
— Methodsplit_and_flatten
takes a tensor as input and produces another one as output (essentially rearranges the input data in an intricate way) so that it can easily be processed with a transformer.
The optional arguments are:
patch_length
: by default this is 7.number_of_patches
: by default this is 16.
GeometricMachineLearning.tensor_mat_mul!
— Methodtensor_mat_mul!(C, A, B)
Multiply the matrix B
onto the tensor A
from the right and store the result in C
.
Also checks the bounds of the input arrays.
The function tensor_mat_mul
calls tensor_mat_mul!
internally.
GeometricMachineLearning.tensor_mat_mul!
— Methodmat_tensor_mul!(C::AbstractArray{T, 3}, B::AbstractArray{T, 3}, A::SymmetricMatrix{T}) where T
Multiply the symmetric matrix A
onto the tensor B
from the right and store the result in C
.
Also checks the bounds of the input arrays.
This performs an efficient multiplication based on the special structure of the symmetric matrix A
.
GeometricMachineLearning.tensor_mat_mul
— Methodtensor_mat_mul(A::AbstractArray{T, 3}, B::AbstractArray{T}) where T
Multipliy the matrix B
onto the tensor A
from the right.
Internally this calls the inplace version tensor_mat_mul!
.
Examples
using GeometricMachineLearning: tensor_mat_mul
A = [1 1 1; 1 1 1; 1 1 1;;; 2 2 2; 2 2 2; 2 2 2]
B = [3 0 0; 0 2 0; 0 0 1]
tensor_mat_mul(A, B)
# output
3×3×2 Array{Int64, 3}:
[:, :, 1] =
3 2 1
3 2 1
3 2 1
[:, :, 2] =
6 4 2
6 4 2
6 4 2
GeometricMachineLearning.tensor_mat_skew_sym_assign
— MethodTakes as input:
Z::AbstractArray{T, 3}
: A tensor that stores a bunch of time series.A::AbstractMatrix
: A matrix that is used to perform various scalar products.
For one of these time series the function performs the following computation:
\[ (z^{(i)}, z^{(j)}) \mapsto (z^{(i)})^TAz^{(j)} \text{ for } i > j.\]
The result of this are $n(n-2)\div2$ scalar products. These scalar products are written into a lower-triangular matrix and the final output of the function is a tensor of these lower-triangular matrices.
GeometricMachineLearning.tensor_mat_skew_sym_assign_kernel!
— MethodA kernel that computes the weighted scalar products of all combinations of vectors in the matrix Z
except where the two vectors are the same and writes the result into a tensor of skew symmetric matrices C
.
GeometricMachineLearning.train!
— Functiontrain!(...)
Perform a training of a neural networks on data using given method a training Method
Different ways of use:
train!(neuralnetwork, data, optimizer = GradientOptimizer(1e-2), training_method; nruns = 1000, batch_size = default(data, type), showprogress = false )
Arguments
neuralnetwork::LuxNeuralNetwork
: the neural net work using LuxBackenddata
: the data (seeTrainingData
)optimizer = GradientOptimizer
: the optimization method (seeOptimizer
)training_method
: specify the loss function usednruns
: number of iteration through the process with default valuebatch_size
: size of batch of data used for each step
GeometricMachineLearning.train!
— Methodtrain!(neuralnetwork, data, optimizer, training_method; nruns = 1000, batch_size, showprogress = false )
Arguments
neuralnetwork::LuxNeuralNetwork
: the neural net work using LuxBackenddata::AbstractTrainingData
: the data- ``
GeometricMachineLearning.within_patch_index
— MethodBased on coordinates i,j this returns the index within the batch
GeometricMachineLearning.write_ones_kernel!
— MethodKernel that is needed for functions relating to SymmetricMatrix
and SkewSymMatrix
GeometricMachineLearning.Ω
— MethodΩ(Y::GrassmannManifold{T}, Δ::AbstractMatrix{T}) where T
Perform the canonical horizontal lift for the Grassmann manifold:
\[ \Delta \mapsto \Omega^{St}(Y, Δ),\]
where $\Omega^{St}$ is the canonical horizontal lift for the Stiefel manifold.
using GeometricMachineLearning
E = GrassmannManifold(StiefelProjection(5, 2))
Δ = [0. 0.; 0. 0.; 2. 3.; 4. 5.; 6. 7.]
GeometricMachineLearning.Ω(E, Δ)
# output
5×5 SkewSymMatrix{Float64, Vector{Float64}}:
0.0 -0.0 -2.0 -4.0 -6.0
0.0 0.0 -3.0 -5.0 -7.0
2.0 3.0 0.0 -0.0 -0.0
4.0 5.0 0.0 0.0 -0.0
6.0 7.0 0.0 0.0 0.0
GeometricMachineLearning.Ω
— MethodΩ(Y::StiefelManifold{T}, Δ::AbstractMatrix{T}) where T
Perform canonical horizontal lift for the Stiefel manifold:
\[ \Delta \mapsto (\mathbb{I} - \frac{1}{2}YY^T)\Delta{}Y^T - Y\Delta^T(\mathbb{I} - \frac{1}{2}YY^T).\]
Internally this performs
SkewSymMatrix(2 * (I(n) - .5 * Y * Y') * Δ * Y')
to save memory.
Examples
using GeometricMachineLearning
E = StiefelManifold(StiefelProjection(5, 2))
Δ = [0. -1.; 1. 0.; 2. 3.; 4. 5.; 6. 7.]
GeometricMachineLearning.Ω(E, Δ)
# output
5×5 SkewSymMatrix{Float64, Vector{Float64}}:
0.0 -1.0 -2.0 -4.0 -6.0
1.0 0.0 -3.0 -5.0 -7.0
2.0 3.0 0.0 -0.0 -0.0
4.0 5.0 0.0 0.0 -0.0
6.0 7.0 0.0 0.0 0.0
Note that the output of Ω
is a skew-symmetric matrix, i.e. an element of $\mathfrak{g}$.
GeometricMachineLearning.𝔄
— Method𝔄(A)
Compute $\mathfrak{A}(A) := \sum_{n=1}^\infty \frac{1}{n!} (A)^{n-1}.$
Implementation
This uses a Taylor expansion that iteratively adds terms with
while norm(Aⁿ) > ε
mul!(A_temp, Aⁿ, A)
Aⁿ .= A_temp
rmul!(Aⁿ, inv(n))
𝔄 += B
n += 1
end
until the norm of Aⁿ
becomes smaller than machine precision. The counter n
in the above algorithm is initialized as 2
The matrices Aⁿ
and 𝔄
are initialized as the identity matrix.
GeometricMachineLearning.𝔄
— Method𝔄(B̂, B̄)
Compute $\mathfrak{A}(B', B'') := \sum_{n=1}^\infty \frac{1}{n!} ((B'')^TB')^{n-1}.$
This expression has the property $\mathbb{I} + B'\mathfrak{A}(B', B'')(B'')^T = \exp(B'(B'')^T).$
Examples
using GeometricMachineLearning
using GeometricMachineLearning: 𝔄
import Random
Random.seed!(123)
B = rand(StiefelLieAlgHorMatrix, 10, 2)
B̂ = hcat(vcat(.5 * B.A, B.B), vcat(one(B.A), zero(B.B)))
B̄ = hcat(vcat(one(B.A), zero(B.B)), vcat(-.5 * B.A, -B.B))
one(B̂ * B̄') + B̂ * 𝔄(B̂, B̄) * B̄' ≈ exp(Matrix(B))
# output
true