Volume-Preserving Transformer

The volume-preserving transformer is, similar to the standard transformer, a combination of two different neural networks: a volume-preserving attention layer and a volume-preserving feedforward layer. It is visualized below:

Library Functions

GeometricMachineLearning.VolumePreservingTransformerType

The volume-preserving transformer with the Cayley activation function and built-in upscaling.

Constructor

The arguments for the constructor are:

  1. sys_dim::Int
  2. seq_length::Int: The sequence length of the data fed into the transformer.

The following are keyword argumetns:

  • n_blocks::Int=1: The number of blocks in one transformer unit (containing linear layers and nonlinear layers). Default is 1.
  • n_linear::Int=1: The number of linear VolumePreservingLowerLayers and VolumePreservingUpperLayers in one block. Default is 1.
  • L::Int=1: The number of transformer units.
  • activation=tanh: The activation function.
  • init_upper::Bool=false: Specifies if the network first acts on the $q$ component.
  • skew_sym::Bool=false: specifies if we the weight matrix is skew symmetric or arbitrary.
source

References

[32]
B. Brantner, G. de Romemont, M. Kraus and Z. Li. Volume-Preserving Transformers for Learning Time Series Data with Structure, arXiv preprint arXiv:2312:11166v2 (2024).