Structure-Preserving Neural Networks

What we means by a structure-preserving neural network or a geometric neural network is a modification of a standard neural networks such that it satisfies certain properties like symplecticity or volume preservation. We first define standard neural networks:

Definition

A neural network architecture is a parameter-dependent realization of a function:

\[ \mathrm{architecture}: \mathbb{P} \to \mathcal{C}(\mathcal{D}, \mathcal{M}), \Theta \mapsto \mathcal{NN}_\Theta,\]

where $\Theta$ are the parameters of the neural network (we call $\mathbb{P}$ the parameter space). $\mathbb{P}$, the domain space $\mathcal{D}$ and the target space $\mathcal{M}$ of the neural network may be spaces with arbitrary structure in general (i.e. need not be vector spaces).

In this text the spaces $\mathcal{D}$ and $\mathcal{M}$ are vector spaces in most cases[1]. The parameter space $\mathbb{P}$ is however build from manifolds in many cases. Weights have to be put on manifolds to realize certain architectures that would otherwise not be possible and can make training more efficient in other cases.

It is a classical result [30] that one-layer feedforward neural networks[2] are universal approximators:

Theorem

Neural networks are dense in the space of continuous functions $\mathcal{C}^(U, \mathbb{R}^m)$ in the compact-open topology, i.e. for every compact subset $K\subset{}U,$ real number $\varepsilon>0$ and function $f\in\mathcal{C}(U, \mathbb{R}^m)$ we can find an integer $N$ as well as weights

\[ \Theta = (A, b, C) \in \mathbb{R}^{N\times{}n}\times\mathbb{R}^{N}\times\mathbb{R}^{m\times{}N} =: \mathbb{P}\]

such that

\[ \sup_{x \in K}|| f(x) - C\sigma(Ax + b) || < \varepsilon,\]

i.e. neural networks can approximate $f$ arbitrarily well on any compact set $K$.

The universal approximation theorem has also been generalized to other neural network architectures [3133].

A structure-preserving or geometric neural networks is a neural network that has additional properties:

Definition

A structure-preserving neural network architecture is a parameter-dependent realization of a function:

\[ \mathrm{sp}\cdot\mathrm{architecture}: \mathbb{P} \to \mathcal{C}(\mathcal{D}, \mathcal{M}), \Theta \mapsto \mathcal{NN}_\Theta,\]

such that $\mathcal{NN}_\Theta$ preserves some structure.

Example

We say that a neural network is symplectic if $\mathcal{NN}_\Theta:\mathbb{R}^n\to\mathbb{R}^m$ (with $m\geq{}n$) preserves $\mathbb{J}$, i.e.

\[ (\nabla_z\mathcal{NN}_\Theta)^T\mathbb{J}_{2m}(\nabla_z\mathcal{NN}_\Theta) = \mathbb{J}_{2n},\]

where $z$ are coordinates on $\mathbb{R}^n$.

If we have $m = n$ then we can use SympNets to realize such architectures; SympNets furthermore are universal approximators for the set of canonical symplectic maps[3] [5]. If $m \neq n$ we can use symplectic autoencoders to realize such an architecture. A different class of neural networks are volume-preserving neural networks:

Example

We say that a neural network is volume-preserving if $\mathcal{NN}_\Theta:\mathbb{R}^n\to\mathbb{R}^n$ is such that:

\[ \det(\nabla_z\mathcal{NN}_\Theta) = 1,\]

where $z$ are coordinates on $\mathbb{R}^n$.

Note that here we keep the dimension constant. Volume-preserving neural networks can be built on the basis of feedforward neural networks or transformers.

References

[6]
S. Greydanus, M. Dzamba and J. Yosinski. Hamiltonian neural networks. Advances in neural information processing systems 32 (2019).
[34]
J. W. Burby, Q. Tang and R. Maulik. Fast neural Poincaré maps for toroidal magnetic fields. Plasma Physics and Controlled Fusion 63, 024001 (2020).
[35]
P. Horn, V. Saz Ulibarrena, B. Koren and S. Portegies Zwart. A Generalized Framework of Neural Networks for Hamiltonian Systems. SSRN preprint SSRN:4555181 (2023).
[36]
E. Celledoni, M. J. Ehrhardt, C. Etmann, R. I. McLachlan, B. Owren, C.-B. Schonlieb and F. Sherry. Structure-preserving deep learning. European journal of applied mathematics 32, 888–936 (2021).
  • 1One exception is Grassmann learning where we learn a vector space.
  • 2We obtain one-layer feedforward neural networks by identifying $\mathcal{P} = \mathbb{R}^{N\times{}n}\times\mathbb{R}^{N}\times\mathbb{R}^{m\times{}N}\ni(A, b, C) =: \Theta$ and $\mathcal{NN}_\Theta(x) = C\sigma(Ax + b)$ for some scalar function $\sigma:\mathbb{R}\to\mathbb{R}$ that is non-polynomial.
  • 3Other neural network architectures that were developed with the same aim are Hamiltonian neural networks [6], Hénon nets [34] and generalized Hamiltonian neural networks [35]. [36] gives an overview over structure preserving neural networks.