Neural Network Integrators

In GeometricMachineLearning we can divide most neural network architectures (that are used for applications to physical systems) into two categories: autoencoders and integrators. This is also closely related to the application of reduced order modeling where autoencoders are used in the offline phase and integrators are used in the online phase.

The term integrator in its most general form refers to an approximation of the flow of an ODE by a numerical scheme. Traditionally, for so called one-step methods, these numerical schemes are constructed by defining certain relationships between a known time step $z^{(t)}$ and a future unknown one $z^{(t+1)}$ [1, 82]:

\[ f(z^{(t)}, z^{(t+1)}) = 0.\]

One usually refers to such a relationship as an integration scheme. If this relationship can be reformulated as

\[ z^{(t+1)} = g(z^{(t)}),\]

then we refer to the scheme as explicit, if it cannot be reformulated in such a way then we refer to it as implicit. Implicit schemes are typically more expensive to solve than explicit ones. The Julia library GeometricIntegrators [2] offers a wide variety of integration schemes both implicit and explicit.

The neural network integrators in GeometricMachineLearning (the corresponding type is NeuralNetworkIntegrator) are all explicit integration schemes where the function $g$ above is modeled with a neural network.

Neural networks, as an alternative to traditional methods, are employed because of (i) potentially superior performance and (ii) an ability to learn unknown dynamics from data.

The simplest of such a neural network for modeling an explicit integrator is the ResNet. SympNets can be seen as the symplectic version of the ResNet. There is an example demonstrating the performance of SympNets. This example demonstrates the advantages of symplectic neural networks.

Multi-step methods

Multi-step method [58, 59] refers to schemes that are of the form^[1]:

\[ f(z^{(t - \mathtt{sl} + 1)}, z^{(t - \mathtt{sl} + 2)}, \ldots, z^{(t)}, z^{(t + 1)}, \ldots, z^{(\mathtt{pw} + 1)}) = 0,\]

where sl is short for sequence length and pw is short for prediction window. Note that we can recover traditional one-step methods by setting sl and pw equal to 1. We can also formulate explicit mulit-step methods. They are of the form:

\[[z^{(t+1)}, \ldots, z^{(t+\mathtt{pw})}] = g(z^{(t - \mathtt{sl} + 1)}, \ldots, z^{(t)}).\]

In GeometricMachineLearning all multi-step methods, as is the case with one-step methods, are explicit. There are essentially two ways to construct multi-step methods with neural networks: the older one is using recurrent neural networks such as long short-term memory cells (LSTMs) [83] and the newer one is using transformer neural networks [54]. Both of these approaches have been successfully employed to learn multi-step methods (see [61, 62] for the former and [4, 84, 85] for the latter), but because the transformer architecture exhibits superior performance on modern hardware and can be imbued with geometric properties we almost always use a transformer-derived architecture when dealing with time series^[2].

Explicit multi-step methods derived from the transformer are always subtypes of the type TransformerIntegrator in GeometricMachineLearning. In GeometricMachineLearning the standard transformer, the volume-preserving transformer and the linear symplectic transformer are implemented.

Remark

For standard multi-step methods (that are not neural network-based) sl is generally a number greater than one whereas pw = 1 in most cases. For the TransformerIntegrators in GeometricMachineLearning however we usually have:

\[ \mathtt{pw} = \mathtt{sl},\]

so the number of vectors in the input sequence is equal to the number of vectors in the output sequence. This makes it easier to define structure-preservation for these architectures and improves stability.

Library Functions

GeometricMachineLearning.NeuralNetworkIntegrator — Type

NeuralNetworkIntegrator is a super type of various neural network architectures such as SympNet and ResNet.

The purpose of such neural networks is to approximate the flow of an ordinary differential equation (ODE).

NeuralNetworkIntegrators can be seen as modeling traditional one-step methods with neural networks, i.e. for a fixed time step they perform:

\[ \mathtt{NeuralNetworkIntegrator}: z^{(t)} \mapsto z^{(t+1)},\]

to try to approximate the flow of some ODE:

\[ || \mathtt{Integrator}(z^{(t)}) - \varphi^h(z^{(t)}) || \approx \mathcal{O}(h),\]

where $\varphi^h$ is the flow map of the ODE for a time step $h$.

source

GeometricMachineLearning.ResNet — Type

ResNet(dim, n_blocks, activation)

Make an instance of a ResNet.

A ResNet is a neural network that realizes a mapping of the form:

\[ x = \mathcal{NN}(x) + x,\]

so the input is again added to the output (a so-called add connection). In GeometricMachineLearning the specific ResNet that we use consists of a series of simple ResNetLayers.

Constructor

ResNet can also be called with the constructor:

ResNet(dl, n_blocks)

where dl is an instance of DataLoader.

See iterate for an example of this.

source

GeometricMachineLearning.ResNetLayer — Type

ResNetLayer(dim)

Make an instance of the resnet layer.

The ResNetLayer is a simple feedforward neural network to which we add the input after applying it, i.e. it realizes $x \mapsto x + \sigma(Ax + b)$.

Arguments

The ResNet layer takes the following arguments:

dim::Integer: the system dimension.
activation = identity: The activation function.

The following is a keyword argument:

use_bias::Bool = true: This determines whether a bias $b$ is used.

source

Base.iterate — Method

iterate(nn, ics)

This function computes a trajectory for a NeuralNetworkIntegrator that has already been trained for valuation purposes.

It takes as input:

nn: a NeuralNetwork (that has been trained).
ics: initial conditions (a NamedTuple of two vectors)

Examples

To demonstrate iterate we use a simple ResNet that does:

\[\mathrm{ResNet}: x \mapsto \begin{pmatrix} 1 & 0 & 0 \\ 0 & 2 & 0 \\ 0 & 0 & 1\end{pmatrix}x + \begin{pmatrix} 0 \\ 0 \\ 1 \end{pmatrix}\]

and we iterate three times with

\[ \mathtt{ics} = \begin{pmatrix} 1 \\ 1 \\ 1 \end{pmatrix}.\]

using GeometricMachineLearning

model = ResNet(3, 0, identity)
weight = [1 0 0; 0 2 0; 0 0 1]
bias = [0, 0, 1]
ps = NeuralNetworkParameters((L1 = (weight = weight, bias = bias), ))
nn = NeuralNetwork(model, Chain(model), ps, CPU())

ics = [1, 1, 1]
iterate(nn, ics; n_points = 4)

# output

3×4 Matrix{Int64}:
 1  2  4   8
 1  3  9  27
 1  3  7  15

Arguments

The optional keyword argument is

n_points = 100

The number of integration steps that should be performed.

source

GeometricMachineLearning.TransformerIntegrator — Type

TransformerIntegrator <: Architecture

Encompasses various transformer architectures, such as the VolumePreservingTransformer and the LinearSymplecticTransformer.

The central idea behind this is to construct an explicit multi-step integrator:

\[ \mathtt{Integrator}: [ z^{(t - \mathtt{sl} + 1)}, z^{(t - \mathtt{sl} + 2)}, \ldots, z^{(t)} ] \mapsto [ z^{(t + 1)}, z^{(t + 2)}, \ldots, z^{(t + \mathtt{pw})} ],\]

where sl stands for sequence length and pw stands for prediction window, so the numbers of input and output vectors respectively.

source

Base.iterate — Method

iterate(nn, ics)

Iterate the neural network of type TransformerIntegrator for initial conditions ics.

The initial condition is a matrix $\in\mathbb{R}^{n\times\mathtt{seq\_length}}$ or NamedTuple of two matrices).

This function computes a trajectory for a Transformer that has already been trained for valuation purposes.

Parameters

The following are optional keyword arguments:

n_points::Int=100: The number of time steps for which we run the prediction.
prediction_window::Int=size(ics.q, 2): The prediction window (i.e. the number of steps we predict into the future) is equal to the sequence length (i.e. the number of input time steps) by default.

source

References

[1]: E. Hairer, C. Lubich and G. Wanner. Geometric Numerical integration: structure-preserving algorithms for ordinary differential equations (Springer, Heidelberg, 2006).
[82]: B. Leimkuhler and S. Reich. Simulating hamiltonian dynamics. No. 14 (Cambridge university press, 2004).
[2]: M. Kraus. GeometricIntegrators.jl: Geometric Numerical Integration in Julia, https://github.com/JuliaGNI/GeometricIntegrators.jl (2020).
[56]: K. Feng. The step-transition operators for multi-step methods of ODE's. Journal of Computational Mathematics, 193–202 (1998).
[54]: A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser and I. Polosukhin. Attention is all you need. Advances in neural information processing systems 30 (2017).
[84]: A. Hemmasian and A. Barati Farimani. Reduced-order modeling of fluid flows with transformers. Physics of Fluids 35 (2023).
[85]: A. Solera-Rico, C. S. Vila, M. Gómez, Y. Wang, A. Almashjary, S. Dawson and R. Vinuesa, $\beta$-Variational autoencoders and transformers for reduced-order modelling of fluid flows, arXiv preprint arXiv:2304.03571 (2023).
[4]: B. Brantner, G. de Romemont, M. Kraus and Z. Li. Volume-Preserving Transformers for Learning Time Series Data with Structure, arXiv preprint arXiv:2312:11166v2 (2024).

1We again assume that all the steps up to and including $t$ are known.
2GeometricMachineLearning also has an LSTM implementation, but this may be deprecated in the future.