Conclusion
In this dissertation it was shown how neural networks can be imbued with structure to improve their approximation capabilities when applied to physical systems. In the following we summarize the novelties of this work and give an outlook for how it can be expanded in the future.
Reduced Order Modeling as Motivation
Most of the work presented in this dissertation is motivated by data-driven reduced order modeling. This is the discipline of building low-dimensional surrogate models from data that come from high-dimensional full order models. Both the low-dimensional surrogate model and the high-dimensional full order model are described by differential equations. When we talk about structure-preserving reduced order modeling we mean that the equation on the low-dimensional space shares features with the equation on the high-dimensional space. In this work these properties were mainly for the vector field to be symplectic or divergence-free. A typical reduced order modeling framework is further divided into two phases:
- in the offline phase we find the low-dimensional surrogate model (reduced representation) and
- in the online phase we solve the equations in the reduced space.
For the offline phase we proposed symplectic autoencoders, and for the online phase we proposed volume-preserving transformers and linear symplectic transformers. In the following we summarize the three main methods that were developed in the course of this dissertation and constitute its main results: symplectic autoencoders, structure-preserving transformers and structure-preserving optimizers.
Structure-Preserving Reduced Order Modeling of Hamiltonian Systems - The Offline Phase
A central part of this dissertation was the development of symplectic autoencoders [3]. Symplectic autoencoders build on existing approaches of symplectic neural networks (SympNets) [5] and proper symplectic decomposition (PSD) [68], both of which preserve symplecticity. SympNets can approximate arbitrary canonical symplectic maps in $\mathbb{R}^{2n},$ i.e.
\[ \mathrm{SympNet}: \mathbb{R}^{2n} \to \mathbb{R}^{2n},\]
but the input has necessarily the same dimension as the output. PSD can change dimension, i.e.[0]
\[ \mathrm{PSD}^\mathrm{enc}: \mathbb{R}^{2N} \to \mathbb{R}^{2n},\]
but is strictly linear. Symplectic autoencoders offer a way of (i) constructing nonlinear symplectic maps that (ii) can change dimension. We used these to reduce a 400-dimensional Hamiltonian system to a two-dimensional one[1]:
\[(\mathbb{R}^{400}, H) \xRightarrow{\mathrm{SAE}^\mathrm{enc}} (\mathbb{R}^2, \bar{H}).\]
For this case we observed speed-ups of up to a factor 1000 when a symplectic autoencoder was combined with a transformer in the online phase. We also compared the symplectic autoencoder to a PSD, and showed that the PSD was unable to learn a useful two-dimensional representation.
Like PSD, symplectic autoencoders have the property that they induce a Hamiltonian system on the reduced space. This distinguishes them from "weakly symplectic autoencoders" [70, 72] that only approximately obtain a Hamiltonian system on a restricted domain by using a "physics-informed neural networks" [71] approach.
We also mention that the development of symplectic autoencoders required generalizing existing neural network optimizers to manifolds[2]. This is further discussed below.
Structure-Preserving Neural Network-Based Integrators - The Online Phase
For the online phase of reduced order modeling we developed new neural network architectures based on the transformer [54] which is a neural network architecture that is extensively used in other fields of neural network research such as natural language processing[3]. We used transformers to build an equivalent of structure-preserving multi-step methods [1].
The transformer consists of a composition of standard neural network layers and attention layers:
\[ \mathrm{Transformer}(Z) = \mathcal{NN}_n\circ\mathrm{AttentionLayer}_n\circ\cdots\circ\mathcal{NN}_1\circ\mathrm{AttentionLayer}_1(Z),\]
where $\mathcal{NN}$ indicates a standard neural network layer (e.g. a multilayer perceptron). The attention layer makes it possible for a transformer to process time series data by acting on a whole series of vectors at once:
\[ \mathrm{AttentionLayer}(Z) = \mathrm{AttentionLayer}(z^{(1)}, \ldots, z^{(T)}) = [f^1(z^{(1)}, \ldots, z^{(T)}), \ldots, f^T(z^{(1)}, \ldots, z^{(T)})].\]
The attention layer thus performs a preprocessing step after which the standard neural network layer $\mathcal{NN}$ is applied.
In this dissertation we presented two modifications of the standard transformer: the volume-preserving transformer [4] and the linear symplectic transformer. In both cases we modified the attention mechanism so that it is either volume-preserving (in the first case) or symplectic (in the second case). The standard neural network layer $\mathcal{NN}$ was replaced by a volume-preserving feedforward neural network or a symplectic neural network [5] respectively.
In this dissertation we applied the volume-preserving transformer for learning the trajectory of a rigid body and the linear symplectic transformer for learning the trajectory of a coupled harmonic oscillator. In both cases our new transformer architecture significantly outperformed the standard transformer. The trajectory modeled with the volume-preserving transformer for instance stays very close to a submanifold which is a level set of the quadratic invariant $I(z_1, z_2, z_3) = z^2_1 + z^2_2 + z^2_3.$ This is not the case for the standard transformer: it moves away from this submanifold after a few time steps.
Structure-Preserving Optimizers
Training a symplectic autoencoder requires optimization on manifolds[4]. The particular manifolds we need in this case are "homogeneous spaces" [109]. In this dissertation we proposed a new optimizer framework that manages to generalize existing neural network optimizers to manifolds. This is done by identifying a global tangent space representation and dispenses with the need for a projection step as is necessary in other approaches [8, 110].
As was already observed by others [8, 9, 111] putting weights on manifolds can improve training significantly in contexts other than scientific computing. Motivated by this we show an example of training a vision transformer [88] on the MNIST data set [90] to demonstrate the efficacy of the new optimizers. Contrary to other applications of the transformer we do not have to rely on layer normalization [112] or add connections to achieve convergent training for relatively big neural networks. We also applied the new optimizers to a neural network that contains weights on the Grassmann manifold to be able to sample from a nonlinear space.
Outlook
We believe that the topics structure-preserving autoencoders, structure-preserving transformers, structure-preserving optimizers and structure-preserving machine learning in general offer great potential for future research.
Symplectic autoencoders could be used for model reduction of higher-dimensional systems [113] as well as using them for treating systems that are more general than canonical Hamiltonian ones; these include port-Hamiltonian [73] and metriplectic [114] systems. Structure-preserving model order reductions for such systems have been proposed [79, 115–117] but without using neural networks. In the appendix we sketch how symplectic autoencoders could be used for structure-preserving model reduction of port-Hamiltonian systems.
Structure-preserving transformers have shown great potential for learning dynamical systems, but their application should not be limited to that area. Structure-preserving machine learning techniques such as Hamilton Monte Carlo [118] has been used in various fields such as image classification [119] and inverse problems [120] and we believe that the structure-preserving transformers introduced in this work can also find applications in these fields, by replacing the activation function in the attention layers of a vision transformer for example.
Lastly structure-preserving optimization is an exciting field, especially with regards to neural networks. The manifold optimizers introduced in this work can speed up neural network training significantly and are suitable for modern hardware (i.e. GPUs). They are however based on existing neural network optimizers such as Adam [108] and thus still lack a clear geometric interpretation. By utilizing a more geometric representation, as presented in this work, we hope to be able to find a differential equation describing Adam and other neural network optimizer, perhaps through a variational principle [121, 122]. One could also build on the existing optimization framework and use retractions other than the geodesic retraction and the Cayley retraction presented here; an example would be a QR-based retraction [123, 124]. This will be left for future work.
- 0Here we only show the PSD encoder $\mathrm{PSD}^\mathrm{enc}.$ A complete reduced order modeling framework also need a decoder $\mathrm{PSD}^\mathrm{dec}$ in addition to the encoder. When we use PSD both of these maps are linear, i.e. can be represented as $(\mathrm{PSD}^\mathrm{enc})^T, \mathrm{PSD}^\mathrm{dec}\in\mathbb{R}^{2N\times{}2n}.$
- 1$\bar{H} = H\circ\Psi^\mathrm{dec}_{\theta_2}:\mathbb{R}^2\to\mathbb{R}$ here refers to the induced Hamiltonian on the reduced space. SAE is short for symplectic autoencoder.
- 2We also refer to optimizers that preserve manifold structure as structure-preserving optimizers.
- 3The T in ChatGPT [95] stands for transformer.
- 4This is necessary to preserve the symplectic structure of the neural network.