Standard Transformer

The transformer is a relatively modern neural network architecture [26] that has come to dominate the field of natural language processing (NLP, [47]) and replaced the previously dominant long-short term memory cells (LSTM, [43]). Its success is due to a variety of factors:

  • unlike LSTMs it consists of very simple building blocks and hence is easier to interpret mathematically,
  • it is very flexible in its application and the data it is fed with do not have to conform to a rigid pattern,
  • transformers utilize modern hardware (especially GPUs) very effectively.

The transformer architecture is sketched below:

It is nothing more than a combination of a multihead attention layer and a residual neural network[1] (ResNet).

Library Functions

GeometricMachineLearning.StandardTransformerIntegratorType

The regular transformer used as an integrator (multi-step method).

The constructor is called with one argument:

  • sys_dim::Int

The following are keyword arguments:

  • transformer_dim::Int: the default is transformer_dim = sys_dim.
  • n_blocks::Int: The default is 1.
  • n_heads::Int: the number of heads in the multihead attentio layer (default is n_heads = sys_dim)
  • L::Int the number of transformer blocks (default is L = 2).
  • upscaling_activation: by default identity
  • resnet_activation: by default tanh
  • add_connection:Bool=true: if the input should be added to the output.
source
  • 1A ResNet is nothing more than a neural network to whose output we again add the input, i.e. every ResNet is of the form $\mathrm{ResNet}(x) = x + \mathcal{NN}(x)$.