The Symplectic Autoencoder

Symplectic autoencoders offer a structure-preserving way of mapping a high-dimensional system to a low-dimensional system. Concretely this means that if we obtain a reduced system by means of a symplectic autoencoder, this system will again be symplectic; we can thus model a symplectic FOM with a symplectic ROM.

The architecture is represented by the figure below[1]:

A visualization of the symplectic autoencoder architecture. It is a composition of SympNet layers and PSD-like layers. A visualization of the symplectic autoencoder architecture. It is a composition of SympNet layers and PSD-like layers.

It is a composition of SympNet gradient layers and PSD-like matrices, so a matrix $A_i$ (respectively $A_i^+$) is of the form

\[ A_i^{(+)} = \begin{bmatrix} \Phi_i & \mathbb{O} \\ \mathbb{O} & \Phi_i \end{bmatrix} \text{ where }\begin{cases} \Phi_i\in{}St(d_{i},d_{i+1})\subset\mathbb{R}^{d_{i+1}\times{}d_i} & \text{if $d_{i+1} > d_i$} \\ \Phi_i\in{}St(d_{i+1},d_{i})\subset\mathbb{R}^{d{i}\times{}d_{i+1}} & \text{if $d_i > d_{i+1}$}, \end{cases}\]

where $A_i^{(+)} = A_i$ if $d_{i+1} > d_i$ and $A_i^{(+)} = A_i^+$ if $d_{i+1} < d_i.$ Also note that for cotangent lift-like matrices we have

\[\begin{aligned} A_i^+ = \mathbb{J}_{2N} A_i^T \mathbb{J}_{2n}^T & = \begin{bmatrix} \mathbb{O}_{n\times{}n} & \mathbb{I}_n \\ -\mathbb{I}_n & \mathbb{O}_{n\times{}n} \end{bmatrix} \begin{bmatrix} \Phi_i^T & \mathbb{O}_{n\times{}N} \\ \mathbb{O}_{n\times{}N} & \Phi_i^T \end{bmatrix} \begin{bmatrix} \mathbb{O}_{N\times{}N} & - \mathbb{I}_N \\ \mathbb{I}_N & \mathbb{O}_{N\times{}N} \end{bmatrix} \\ & = \begin{bmatrix} \Phi_i^T & \mathbb{O}_{n\times{}N} \\ \mathbb{O}_{n\times{}N} & \Phi_i^T \end{bmatrix} = A_i^T, \end{aligned}\]

so the symplectic inverse is equivalent to a matrix transpose in this case. In the symplectic autoencoder we use SympNets as a form of symplectic preprocessing before the linear symplectic reduction (i.e. the PSD layer) is employed. The resulting neural network has some of its weights on manifolds, which is why we cannot use standard neural network optimizers, but have to resort to manifold optimizers. Note that manifold optimization is not necessary for the weights corresponding to the SympNet layers, these are still updated with standard neural network optimizers during training. Also note that SympNets are nonlinear and preserve symplecticity, but they cannot change the dimension of a system while PSD layers can change the dimension of a system and preserve symplecticity, but are strictly linear. Symplectic autoencoders have all three properties: they preserve symplecticity, can change dimension and are nonlinear mappings. We can visualize this in a Venn diagram:

Venn diagram visualizing that a symplectic autoencoder (SAE) is symplectic, can change dimension and is nonlinear. Venn diagram visualizing that a symplectic autoencoder (SAE) is symplectic, can change dimension and is nonlinear.

We now show the proof that shows $\nabla_{\mathcal{R}(z)}\psi = (\nabla_{z}\mathcal{R})^+$ which was used when showing the equivalence between Hamiltonian systems on the full and the reduced space:

Proof

The symplectic autoencoder is a composition of $G$-SympNet layers and PSD-like matrices:

\[\Psi^d = A_n\circ\psi_n\circ\cdots\circ{}A_1\circ\psi_1.\]

It's local inverse is

\[(\Psi^d)^{-1} = \psi_1^{-1}\circ{}A_1^+\circ\ldots\circ\psi_n^{-1}\circ{}A_n^+.\]

The jacobian of $\Psi^d$ is:

\[\nabla_z\Psi^d = A_n\nabla_{A_{n-1}\cdots{}A_1\psi_1(z)}\psi_n\cdots{}A_1\nabla_z\psi_1,\]

and thus

\[(\nabla_z\Psi^d)^+ = (\nabla\psi)^+A_1^+\cdots(\nabla\psi_n)^+A_n^+,\]

where we dropped the argument in the derivative of the nonlinear parts. We further have

\[A^+ = A^T\]

for PSD-like matrices and

\[(\nabla_z\psi)^+ = \begin{pmatrix} \mathbb{O} & \mathbb{I} \\ -\mathbb{I} & \mathbb{O} \end{pmatrix} \begin{pmatrix} \mathbb{I} & \nabla_pf \\ \mathbb{O} & \mathbb{I} \end{pmatrix}^T \begin{pmatrix} \mathbb{O} & -\mathbb{I} \\ \mathbb{I} & \mathbb{O} \end{pmatrix} = \begin{pmatrix} \mathbb{I} & -\nabla_pf \\ \mathbb{O} & \mathbb{I} \end{pmatrix},\]

for the $G$-SympNet layers, where we assumed that $\psi$ only changes the $q$ component. Because these matrices are square, the inverse $(\nabla\psi)^+ = (\nabla\psi)^{-1}$ is unique.

The SympNet layers in the symplectic autoencoder operate in intermediate dimensions (as well as the input and output dimensions). In the following we explain how GeometricMachineLearning computes those intermediate dimensions.

Intermediate Dimensions

For a high-fidelity system of dimension $2N$ and a reduced system of dimension $2n$, the intermediate dimensions in the symplectic encoder and the decoder are computed according to:

iterations = Vector{Int}(n : (N - n) ÷ (n_blocks - 1) : N)
iterations[end] = full_dim2
iterations * 2

So for e.g. $2N = 100,$ $2n = 10$ and $\mathtt{n\_blocks} = 3$ we get

\[\mathrm{iterations} = 5\mathtt{:}(45 \div 2)\mathtt{:}50 = 5\mathtt{:}22\mathtt{:}50 = (5, 27, 49).\]

We still have to perform the two other modifications in the algorithm above:

  1. iterations[end] = full_dim2 $\ldots$ assign full_dim2 to the last entry,
  2. iterations * 2 $\ldots$ multiply all the intermediate dimensions by two.

The resulting dimensions are:

\[(10, 54, 100).\]

The second step (the multiplication by two) is needed to arrive at intermediate dimensions that are even. This is necessary to preserve the canonical symplectic structure of the system.

Example

A visualization of an instance of SymplecticAutoencoder is shown below:

Example of a symplectic autoencoder. The SympNet layers are in green, the PSD-like layers are in blue. Example of a symplectic autoencoder. The SympNet layers are in green, the PSD-like layers are in blue.

In this figure we have the following configuration: n_encoder_blocks is two, n_encoder_layers is four, n_decoder_blocks is three and n_decoder_layers is two. For a full dimension of 100 and a reduced dimension of ten we can build such an instance of a symplectic autoencoder by calling:

using GeometricMachineLearning

const full_dim = 100
const reduced_dim = 10

model = SymplecticAutoencoder(full_dim, reduced_dim;
                                                    n_encoder_blocks = 2,
                                                    n_encoder_layers = 4,
                                                    n_decoder_blocks = 3,
                                                    n_decoder_layers = 2)

for layer in Chain(model)
    println(stdout, layer)
end
GradientLayerQ{100, 100, typeof(tanh)}(500, tanh)
GradientLayerP{100, 100, typeof(tanh)}(500, tanh)
GradientLayerQ{100, 100, typeof(tanh)}(500, tanh)
GradientLayerP{100, 100, typeof(tanh)}(500, tanh)
PSDLayer{100, 10}()
GradientLayerQ{10, 10, typeof(tanh)}(50, tanh)
GradientLayerP{10, 10, typeof(tanh)}(50, tanh)
PSDLayer{10, 54}()
GradientLayerQ{54, 54, typeof(tanh)}(270, tanh)
GradientLayerP{54, 54, typeof(tanh)}(270, tanh)
PSDLayer{54, 100}()

We also see that the intermediate dimension in the decoder is 54 for the specified dimensions and n_decoder_blocks = 3 as was outlined before.

Library Functions

GeometricMachineLearning.SymplecticAutoencoderType
SymplecticAutoencoder(full_dim, reduced_dim)

Make an instance of SymplecticAutoencoder for dimensions full_dim and reduced_dim.

The architecture

The symplectic autoencoder architecture was introduced in [3]. Like any other autoencoder it consists of an encoder $\Psi^e:\mathbb{R}^{2N}\to\mathbb{R}^{2n}$ and a decoder $\Psi^d:\mathbb{R}^{2n}\to\mathbb{R}^{2N}$ with $n\ll{}N$. These satisfy the following properties:

\[\begin{aligned} \nabla_z\Psi^e\mathbb{J}_{2N}(\nabla_z\Psi^e\mathbb{J}_{2N})^T = \mathbb{J}_{2n} & \quad\text{and} \\ (\nabla_\xi\Psi^d)^T\mathbb{J}_{2N}\nabla_\xi\Psi^d = \mathbb{J}_{2n}. & \end{aligned}\]

Because the decoder has this particular property, the reduced system can be described by the Hamiltonian $H\circ\Psi^d$:

\[\mathbb{J}_{2n}\nabla_\xi(H\circ\Psi^d) = \mathbb{J}_{2n}(\nabla_\xi\Psi^d)^T\nabla_{\Psi^d(\xi)}H = \mathbb{J}_{2n}(\nabla_\xi\Psi^d)^T\mathbb{J}_{2N}^T\mathbb{J}_{2N}\nabla_{\Psi^d(\xi)}H = (\nabla_\xi\Psi^d)^+X_H(\Psi^d(\xi)),\]

where $(\nabla_\xi\Psi^d)^+$ is the symplectic inverse of $\nabla_\xi\Psi^d$ (for more details see the docs on the AutoEncoder type).

Arguments

Besides the required arguments full_dim and reduced_dim you can provide the following keyword arguments:

  • n_encoder_layers::Integer = 4: The number of layers in one encoder block.
  • n_encoder_blocks::Integer = 2: The number of encoder blocks.
  • n_decoder_layers::Integer = 1: The number of layers in one decoder block.
  • n_decoder_blocks::Integer = 3: The number of decoder blocks.
  • sympnet_upscale::Integer = 5: The upscaling dimension of the GSympNet. See GradientLayerQ and GradientLayerP.
  • activation = tanh: The activation in the gradient layers.
  • encoder_init_q::Bool = true: Specifies if the first layer in each encoder block should be of $q$ type.
  • decoder_init_q::Bool = true: Specifies if the first layer in each decoder block should be of $p$ type.
source

References

[3]
B. Brantner and M. Kraus. Symplectic autoencoders for Model Reduction of Hamiltonian Systems, arXiv preprint arXiv:2312.10004 (2023).
  • 1For the symplectic autoencoder we only use SympNet gradient layers because they seem to outperform $LA$-SympNets in many cases and are easier to interpret: their nonlinear part is the gradient of a function that only depends on half the coordinates.