Standard Transformer
The transformer is a relatively modern neural network architecture [26] that has come to dominate the field of natural language processing (NLP, [47]) and replaced the previously dominant long-short term memory cells (LSTM, [43]). Its success is due to a variety of factors:
- unlike LSTMs it consists of very simple building blocks and hence is easier to interpret mathematically,
- it is very flexible in its application and the data it is fed with do not have to conform to a rigid pattern,
- transformers utilize modern hardware (especially GPUs) very effectively.
The transformer architecture is sketched below:
It is nothing more than a combination of a multihead attention layer and a residual neural network[1] (ResNet).
Library Functions
GeometricMachineLearning.StandardTransformerIntegrator
— TypeThe regular transformer used as an integrator (multi-step method).
The constructor is called with one argument:
sys_dim::Int
The following are keyword arguments:
transformer_dim::Int
: the default istransformer_dim = sys_dim
.n_blocks::Int
: The default is1
.n_heads::Int
: the number of heads in the multihead attentio layer (default isn_heads = sys_dim
)L::Int
the number of transformer blocks (default isL = 2
).upscaling_activation
: by default identityresnet_activation
: by default tanhadd_connection:Bool=true
: if the input should be added to the output.
- 1A ResNet is nothing more than a neural network to whose output we again add the input, i.e. every ResNet is of the form $\mathrm{ResNet}(x) = x + \mathcal{NN}(x)$.