Neural Network Integrators
In GeometricMachineLearning
we can divide most neural network architectures (that are used for applications to physical systems) into two categories: autoencoders and integrators. This is also closely related to the application of reduced order modeling where autoencoders are used in the offline phase and integrators are used in the online phase.
The term integrator in its most general form refers to an approximation of the flow of an ODE by a numerical scheme. Traditionally, for so called one-step methods, these numerical schemes are constructed by defining certain relationships between a known time step $z^{(t)}$ and a future unknown one $z^{(t+1)}$ [1, 82]:
\[ f(z^{(t)}, z^{(t+1)}) = 0.\]
One usually refers to such a relationship as an integration scheme. If this relationship can be reformulated as
\[ z^{(t+1)} = g(z^{(t)}),\]
then we refer to the scheme as explicit, if it cannot be reformulated in such a way then we refer to it as implicit. Implicit schemes are typically more expensive to solve than explicit ones. The Julia
library GeometricIntegrators
[2] offers a wide variety of integration schemes both implicit and explicit.
The neural network integrators in GeometricMachineLearning
(the corresponding type is NeuralNetworkIntegrator
) are all explicit integration schemes where the function $g$ above is modeled with a neural network.
Neural networks, as an alternative to traditional methods, are employed because of (i) potentially superior performance and (ii) an ability to learn unknown dynamics from data.
The simplest of such a neural network for modeling an explicit integrator is the ResNet
. SympNets can be seen as the symplectic version of the ResNet. There is an example demonstrating the performance of SympNets. This example demonstrates the advantages of symplectic neural networks.
Multi-step methods
Multi-step method [58, 59] refers to schemes that are of the form[1]:
\[ f(z^{(t - \mathtt{sl} + 1)}, z^{(t - \mathtt{sl} + 2)}, \ldots, z^{(t)}, z^{(t + 1)}, \ldots, z^{(\mathtt{pw} + 1)}) = 0,\]
where sl
is short for sequence length and pw
is short for prediction window. Note that we can recover traditional one-step methods by setting sl
and pw
equal to 1. We can also formulate explicit mulit-step methods. They are of the form:
\[[z^{(t+1)}, \ldots, z^{(t+\mathtt{pw})}] = g(z^{(t - \mathtt{sl} + 1)}, \ldots, z^{(t)}).\]
In GeometricMachineLearning
all multi-step methods, as is the case with one-step methods, are explicit. There are essentially two ways to construct multi-step methods with neural networks: the older one is using recurrent neural networks such as long short-term memory cells (LSTMs) [83] and the newer one is using transformer neural networks [54]. Both of these approaches have been successfully employed to learn multi-step methods (see [61, 62] for the former and [4, 84, 85] for the latter), but because the transformer architecture exhibits superior performance on modern hardware and can be imbued with geometric properties we almost always use a transformer-derived architecture when dealing with time series[2].
Explicit multi-step methods derived from the transformer are always subtypes of the type TransformerIntegrator
in GeometricMachineLearning
. In GeometricMachineLearning
the standard transformer, the volume-preserving transformer and the linear symplectic transformer are implemented.
For standard multi-step methods (that are not neural network-based) sl
is generally a number greater than one whereas pw = 1
in most cases. For the TransformerIntegrator
s in GeometricMachineLearning
however we usually have:
\[ \mathtt{pw} = \mathtt{sl},\]
so the number of vectors in the input sequence is equal to the number of vectors in the output sequence. This makes it easier to define structure-preservation for these architectures and improves stability.
Library Functions
GeometricMachineLearning.NeuralNetworkIntegrator
— TypeNeuralNetworkIntegrator
is a super type of various neural network architectures such as SympNet
and ResNet
.
The purpose of such neural networks is to approximate the flow of an ordinary differential equation (ODE).
NeuralNetworkIntegrator
s can be seen as modeling traditional one-step methods with neural networks, i.e. for a fixed time step they perform:
\[ \mathtt{NeuralNetworkIntegrator}: z^{(t)} \mapsto z^{(t+1)},\]
to try to approximate the flow of some ODE:
\[ || \mathtt{Integrator}(z^{(t)}) - \varphi^h(z^{(t)}) || \approx \mathcal{O}(h),\]
where $\varphi^h$ is the flow map of the ODE for a time step $h$.
GeometricMachineLearning.ResNet
— TypeResNet(dim, n_blocks, activation)
Make an instance of a ResNet
.
A ResNet is a neural network that realizes a mapping of the form:
\[ x = \mathcal{NN}(x) + x,\]
so the input is again added to the output (a so-called add connection). In GeometricMachineLearning
the specific ResNet that we use consists of a series of simple ResNetLayer
s.
Constructor
ResNet
can also be called with the constructor:
ResNet(dl, n_blocks)
where dl
is an instance of DataLoader
.
See iterate
for an example of this.
GeometricMachineLearning.ResNetLayer
— TypeResNetLayer(dim)
Make an instance of the resnet layer.
The ResNetLayer
is a simple feedforward neural network to which we add the input after applying it, i.e. it realizes $x \mapsto x + \sigma(Ax + b)$.
Arguments
The ResNet layer takes the following arguments:
dim::Integer
: the system dimension.activation = identity
: The activation function.
The following is a keyword argument:
use_bias::Bool = true
: This determines whether a bias $b$ is used.
Base.iterate
— Methoditerate(nn, ics)
This function computes a trajectory for a NeuralNetworkIntegrator
that has already been trained for valuation purposes.
It takes as input:
nn
: aNeuralNetwork
(that has been trained).ics
: initial conditions (aNamedTuple
of two vectors)
Examples
To demonstrate iterate
we use a simple ResNet that does:
\[\mathrm{ResNet}: x \mapsto \begin{pmatrix} 1 & 0 & 0 \\ 0 & 2 & 0 \\ 0 & 0 & 1\end{pmatrix}x + \begin{pmatrix} 0 \\ 0 \\ 1 \end{pmatrix}\]
and we iterate three times with
\[ \mathtt{ics} = \begin{pmatrix} 1 \\ 1 \\ 1 \end{pmatrix}.\]
using GeometricMachineLearning
model = ResNet(3, 0, identity)
weight = [1 0 0; 0 2 0; 0 0 1]
bias = [0, 0, 1]
ps = NeuralNetworkParameters((L1 = (weight = weight, bias = bias), ))
nn = NeuralNetwork(model, Chain(model), ps, CPU())
ics = [1, 1, 1]
iterate(nn, ics; n_points = 4)
# output
3×4 Matrix{Int64}:
1 2 4 8
1 3 9 27
1 3 7 15
Arguments
The optional keyword argument is
n_points = 100
The number of integration steps that should be performed.
GeometricMachineLearning.TransformerIntegrator
— TypeTransformerIntegrator <: Architecture
Encompasses various transformer architectures, such as the VolumePreservingTransformer
and the LinearSymplecticTransformer
.
The central idea behind this is to construct an explicit multi-step integrator:
\[ \mathtt{Integrator}: [ z^{(t - \mathtt{sl} + 1)}, z^{(t - \mathtt{sl} + 2)}, \ldots, z^{(t)} ] \mapsto [ z^{(t + 1)}, z^{(t + 2)}, \ldots, z^{(t + \mathtt{pw})} ],\]
where sl
stands for sequence length and pw
stands for prediction window, so the numbers of input and output vectors respectively.
Base.iterate
— Methoditerate(nn, ics)
Iterate the neural network of type TransformerIntegrator
for initial conditions ics
.
The initial condition is a matrix $\in\mathbb{R}^{n\times\mathtt{seq\_length}}$ or NamedTuple
of two matrices).
This function computes a trajectory for a Transformer that has already been trained for valuation purposes.
Parameters
The following are optional keyword arguments:
n_points::Int=100
: The number of time steps for which we run the prediction.prediction_window::Int=size(ics.q, 2)
: The prediction window (i.e. the number of steps we predict into the future) is equal to the sequence length (i.e. the number of input time steps) by default.
References
- [1]
- E. Hairer, C. Lubich and G. Wanner. Geometric Numerical integration: structure-preserving algorithms for ordinary differential equations (Springer, Heidelberg, 2006).
- [82]
- B. Leimkuhler and S. Reich. Simulating hamiltonian dynamics. No. 14 (Cambridge university press, 2004).
- [2]
- [56]
- K. Feng. The step-transition operators for multi-step methods of ODE's. Journal of Computational Mathematics, 193–202 (1998).
- [54]
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser and I. Polosukhin. Attention is all you need. Advances in neural information processing systems 30 (2017).
- [84]
- A. Hemmasian and A. Barati Farimani. Reduced-order modeling of fluid flows with transformers. Physics of Fluids 35 (2023).
- [85]
- A. Solera-Rico, C. S. Vila, M. Gómez, Y. Wang, A. Almashjary, S. Dawson and R. Vinuesa, $\beta$-Variational autoencoders and transformers for reduced-order modelling of fluid flows, arXiv preprint arXiv:2304.03571 (2023).
- [4]
- B. Brantner, G. de Romemont, M. Kraus and Z. Li. Volume-Preserving Transformers for Learning Time Series Data with Structure, arXiv preprint arXiv:2312:11166v2 (2024).