Volume-Preserving Transformer
The volume-preserving transformer is, similar to the standard transformer, a combination of two different neural networks: a volume-preserving attention layer and a volume-preserving feedforward layer. It is visualized below:
Library Functions
GeometricMachineLearning.VolumePreservingTransformer
— TypeThe volume-preserving transformer with the Cayley activation function and built-in upscaling.
Constructor
The arguments for the constructor are:
sys_dim::Int
seq_length::Int
: The sequence length of the data fed into the transformer.
The following are keyword argumetns:
n_blocks::Int=1
: The number of blocks in one transformer unit (containing linear layers and nonlinear layers). Default is1
.n_linear::Int=1
: The number of linearVolumePreservingLowerLayer
s andVolumePreservingUpperLayer
s in one block. Default is1
.L::Int=1
: The number of transformer units.activation=tanh
: The activation function.init_upper::Bool=false
: Specifies if the network first acts on the $q$ component.skew_sym::Bool=false
: specifies if we the weight matrix is skew symmetric or arbitrary.
References
- [32]
- B. Brantner, G. de Romemont, M. Kraus and Z. Li. Volume-Preserving Transformers for Learning Time Series Data with Structure, arXiv preprint arXiv:2312:11166v2 (2024).