SympNet Layers

The SympNet paper [5] discusses three different kinds of sympnet layers: activation layers, linear layers and gradient layers. We discuss them below. Because activation layers are just a simplified form of gradient layers those are introduced together. A neural network that consists of many of these layers we call a SympNet.

SympNet Gradient Layer

The Sympnet gradient layer is based on the following theorem:

Theorem

Given a symplectic vector space $\mathbb{R}^{2n}$ with coordinates $q_1, \ldots, q_n, p_1, \ldots, p_n$ and a function $f:\mathbb{R}^n\to\mathbb{R}$ that only acts on the $q$ part, the map $(q, p) \mapsto (q, p + \nabla_qf)$ is symplectic. A similar statement holds if $f$ only acts on the $p$ part.

Proof

Proofing this is straightforward by looking at the gradient of the mapping:

\[ \begin{pmatrix} \mathbb{I} & \mathbb{O} \\ \nabla_q^2f & \mathbb{I} \end{pmatrix},\]

where $\nabla_q^2f$ is the Hessian of $f$. This matrix is symmetric and for any symmetric matrix $A$ we have that:

\[ \begin{pmatrix} \mathbb{I} & \mathbb{O} \\ A & \mathbb{I} \end{pmatrix}^T \mathbb{J}_{2n} \begin{pmatrix} \mathbb{I} & \mathbb{O} \\ A & \mathbb{I} \end{pmatrix} = \begin{pmatrix} \mathbb{I} & A \\ \mathbb{O} & \mathbb{I} \end{pmatrix} \begin{pmatrix} \mathbb{O} & \mathbb{I} \\ -\mathbb{I} & \mathbb{O} \end{pmatrix} \begin{pmatrix} \mathbb{I} & \mathbb{O} \\ A & \mathbb{I} \end{pmatrix} = \begin{pmatrix} \mathbb{O} & \mathbb{I} \\ -\mathbb{I} & \mathbb{O} \end{pmatrix} = \mathbb{J}_{2n},\]

thus showing symplecticity.

If we deal with GSympNets this function $f$ is

\[ f(q) = a^T \Sigma(Kq + b),\]

where $a, b\in\mathbb{R}^m$, $K\in\mathbb{R}^{m\times{}n}$ and $\Sigma$ is the antiderivative of some common activation function $\sigma$. We routinely refer to $m$ as the upscaling dimension in GeometricMachineLearning. Computing the gradient of $f$ gives:

\[ [\nabla_qf]_k = \sum_{i=1}^m a_i \sigma(\sum_{j=1}^nk_{ij}q_j + b_i)k_{ik} = K^T (a \odot \sigma(Kq + b)),\]

where $\odot$ is the element-wise product, i.e. $[a\odot{}v]_k = a_kv_k$. This is the form that gradient layers take. In addition to gradient layers GeometricMachineLearning also has linear and activation layers implemented. Activation layers are simplified versions of gradient layers. These are equivalent to taking $m = n$ and $K = \mathbb{I}.$

SympNet Linear Layer

Linear layers of type $p$ are of the form:

\[\begin{pmatrix} q \\ p \end{pmatrix} \mapsto \begin{pmatrix} \mathbb{I} & \mathbb{O} \\ A & \mathbb{I} \end{pmatrix} \begin{pmatrix} q \\ p \end{pmatrix},\]

where $A$ is a symmetric matrix. This is implemented very efficiently in GeometricMachineLearning with the special matrix SymmetricMatrix.

Library Functions

GeometricMachineLearning.SympNetLayer — Type

SympNetLayer <: AbstractExplicitLayer

Implements the various layers from the SympNet paper [5].

This is a super type of GradientLayer, ActivationLayer and LinearLayer.

See the relevant docstrings of those layers for more information.

source

GeometricMachineLearning.GradientLayer — Type

GradientLayer <: SympNetLayer

See the docstrings for the constructors GradientLayerQ and GradientLayerP.

source

GeometricMachineLearning.GradientLayerQ — Type

GradientLayerQ(n, upscaling_dimension, activation)

Make an instance of a gradient-$q$ layer.

The gradient layer that changes the $q$ component. It is of the form:

\[\begin{bmatrix} \mathbb{I} & \nabla{}V \\ \mathbb{O} & \mathbb{I} \end{bmatrix},\]

with $V(p) = \sum_{i=1}^Ma_i\Sigma(\sum_jk_{ij}p_j+b_i)$, where $\mathtt{activation} \equiv \Sigma$ is the antiderivative of the activation function $\sigma$ (one-layer neural network). We refer to $M$ as the upscaling dimension.

Such layers are by construction symplectic.

source

GeometricMachineLearning.GradientLayerP — Type

GradientLayerP(n, upscaling_dimension, activation)

Make an instance of a gradient-$p$ layer.

The gradient layer that changes the $p$ component. It is of the form:

\[\begin{bmatrix} \mathbb{I} & \mathbb{O} \\ \nabla{}V & \mathbb{I} \end{bmatrix},\]

with $V(p) = \sum_{i=1}^Ma_i\Sigma(\sum_jk_{ij}q_j+b_i)$, where $\mathtt{activation} \equiv \Sigma$ is the antiderivative of the activation function $\sigma$ (one-layer neural network). We refer to $M$ as the upscaling dimension.

Such layers are by construction symplectic.

source

GeometricMachineLearning.LinearLayer — Type

LinearLayer <: SympNetLayer

See the constructors LinearLayerQ and LinearLayerP.

Implementation

LinearLayer uses the custom matrix SymmetricMatrix for its weight.

source

GeometricMachineLearning.LinearLayerQ — Type

LinearLayerQ(n)

Make a linear layer of dimension $n\times{}n$ that only changes the $q$ component.

This is equivalent to a left multiplication by the matrix:

\[\begin{pmatrix} \mathbb{I} & A \\ \mathbb{O} & \mathbb{I} \end{pmatrix}, \]

where $A$ is a SymmetricMatrix.

source

GeometricMachineLearning.LinearLayerP — Type

LinearLayerP(n)

Make a linear layer of dimension $n\times{}n$ that only changes the $p$ component.

This is equivalent to a left multiplication by the matrix:

\[\begin{pmatrix} \mathbb{I} & \mathbb{O} \\ A & \mathbb{I} \end{pmatrix}, \]

where $A$ is a SymmetricMatrix.

source

GeometricMachineLearning.ActivationLayer — Type

ActivationLayer <: SympNetLayer

See the constructors ActivationLayerQ and ActivationLayerP.

source

GeometricMachineLearning.ActivationLayerQ — Type

ActivationLayerQ(n, σ)

Make an activation layer of size n and with activation σ that only changes the $q$ component.

Performs:

\[\begin{pmatrix} q \\ p \end{pmatrix} \mapsto \begin{pmatrix} q + \mathrm{diag}(a)\sigma(p) \\ p \end{pmatrix}.\]

This can be recovered from GradientLayerQ by setting $M$ equal to n, $K$ equal to $\mathbb{I}$ and $b$ equal to zero.

source

GeometricMachineLearning.ActivationLayerP — Type

ActivationLayerP(n, σ)

Make an activation layer of size n and with activation σ that only changes the $p$ component.

Performs:

\[\begin{pmatrix} q \\ p \end{pmatrix} \mapsto \begin{pmatrix} q \\ p + \mathrm{diag}(a)\sigma(q) \end{pmatrix}.\]

This can be recovered from GradientLayerP by setting $M$ equal to n, $K$ equal to $\mathbb{I}$ and $b$ equal to zero.

source

References

[5]: P. Jin, Z. Zhang, A. Zhu, Y. Tang and G. E. Karniadakis. SympNets: Intrinsic structure-preserving symplectic networks for identifying Hamiltonian systems. Neural Networks 132, 166–179 (2020).