SympNet Layers
The SympNet paper [24] discusses three different kinds of sympnet layers: activation layers, linear layers and gradient layers. We discuss them below. Because activation layers are just a simplified form of gradient layers those two will be discussed together. A neural network that consists of many of these layers we call a SympNet.
SympNet Gradient Layer
The Sympnet gradient layer (called GradientLayer
in GeometricMachineLearning
) is based on the following theorem:
Given a symplectic vector space $\mathbb{R}^{2n}$ which coordinates $q_1, \ldots, q_n, p_1, \ldots, p_n$ and a function $f:\mathbb{R}^n\to\mathbb{R}$ that only acts on the $q$ part, the map $(q, p) \mapsto (q, p + \nabla_qf)$ is symplectic. A similar statement holds if $f$ only acts on the $p$ part.
Proof
Proofing this is straightforward by looking at the gradient of the mapping:
\[ \begin{pmatrix} \mathbb{I} & \mathbb{O} \\ \nabla_q^2f & \mathbb{I} \end{pmatrix},\]
where ``\nabla_q^2f`` is the Hessian of ``f``. This matrix is symmetric and for any symmetric matrix ``A`` we have that:
```math
\begin{pmatrix}
\mathbb{I} & \mathbb{O} \\
A & \mathbb{I}
\end{pmatrix}^T \mathbb{J}_{2n}
\begin{pmatrix}
\mathbb{I} & \mathbb{O} \\
A & \mathbb{I}
\end{pmatrix} =
\begin{pmatrix}
\mathbb{I} & A \\
\mathbb{O} & \mathbb{I}
\end{pmatrix}
\begin{pmatrix}
\mathbb{O} & \mathbb{I} \\
-\mathbb{I} & \mathbb{O}
\end{pmatrix}
\begin{pmatrix}
\mathbb{I} & \mathbb{O} \\
A & \mathbb{I}
\end{pmatrix} =
\begin{pmatrix}
\mathbb{O} & \mathbb{I} \\
-\mathbb{I} & \mathbb{O}
\end{pmatrix} = \mathbb{J}_{2n}.
``` Thus showing symplecticity.
If we deal with GSympNet
s this function $f$ is
\[ f(q) = a^T \Sigma(Kq + b),\]
where $a, b\in\mathbb{R}^m$, $K\in\mathbb{R}^{m\times{}n}$ and $\Sigma$ is the antiderivative of some common activation function $\sigma$. We routinely refer to $m$ as the upscaling dimension in GeometricMachineLearning
. Computing the gradient of $f$ gives:
\[ [\nabla_qf]_k = \sum_{i=1}^m a_i \sigma(\sum_{j=1}^nk_{ij}q_j + b_i)k_{ik} = K^T a \odot \sigma(Kq + b),\]
where $\odot$ is the element-wise product, i.e. $[a\odot{}v]_k = a_kv_k$. This is the form that gradient layers take. In addition to gradient layers GeometricMachineLearning
also has linear and activation layers implemented. Activation layers are simplified versions of gradient layers. These are equivalent to taking $m = n$ and $K = \mathbb{I}.$
SympNet Linear Layer
Linear layers of type $q$ are of the form:
\[\begin{pmatrix} q \\ p \end{pmatrix} \mapsto \begin{pmatrix} \mathbb{I} & \mathbb{O} \\ A & \mathbb{I} \end{pmatrix} \begin{pmatrix} q \\ p \end{pmatrix},\]
where $A$ is a symmetric matrix. This is implemented very efficiently in GeometricMachineLearning
with the special matrix SymmetricMatrix
.
Library Functions
GeometricMachineLearning.SympNetLayer
— TypeImplements the various layers from the SympNet paper [24]. This is a super type of GradientLayer
, ActivationLayer
and LinearLayer
.
For the linear layer, the activation and the bias are left out, and for the activation layer $K$ and $b$ are left out!
GeometricMachineLearning.GradientLayer
— TypeGradientLayer
is the struct
corresponding to the constructors GradientLayerQ
and GradientLayerP
. See those for more information.
GeometricMachineLearning.GradientLayerQ
— TypeGradientLayerQ(n, upscaling_dimension, activation)
Make an instance of a gradient-$q$ layer.
The gradient layer that changes the $q$ component. It is of the form:
\[\begin{bmatrix} \mathbb{I} & \nabla{}V \\ \mathbb{O} & \mathbb{I} \end{bmatrix},\]
with $V(p) = \sum_{i=1}^Ma_i\Sigma(\sum_jk_{ij}p_j+b_i)$, where $\Sigma$ is the antiderivative of the activation function $\sigma$ (one-layer neural network). We refer to $M$ as the upscaling dimension. Such layers are by construction symplectic.
GeometricMachineLearning.GradientLayerP
— TypeGradientLayerP(n, upscaling_dimension, activation)
Make an instance of a gradient-$p$ layer.
The gradient layer that changes the $p$ component. It is of the form:
\[\begin{bmatrix} \mathbb{I} & \mathbb{O} \\ \nabla{}V & \mathbb{I} \end{bmatrix},\]
with $V(p) = \sum_{i=1}^Ma_i\Sigma(\sum_jk_{ij}q_j+b_i)$, where $\Sigma$ is the antiderivative of the activation function $\sigma$ (one-layer neural network). We refer to $M$ as the upscaling dimension. Such layers are by construction symplectic.
References
- [24]
- P. Jin, Z. Zhang, A. Zhu, Y. Tang and G. E. Karniadakis. SympNets: Intrinsic structure-preserving symplectic networks for identifying Hamiltonian systems. Neural Networks 132, 166–179 (2020).