SympNets with GeometricMachineLearning
This page serves as a short introduction into using SympNets with GeometricMachineLearning
.
Loss function
The FeedForwardLoss
is the default choice used in GeometricMachineLearning
for training SympNets, this can however be changed or tweaked.
Training a Harmonic Oscillator
Here we begin with a simple example, the harmonic oscillator, the Hamiltonian of which is
\[H:(q,p)\in\mathbb{R}^2 \mapsto \frac{1}{2}p^2 + \frac{1}{2}q^2 \in \mathbb{R}.\]
Here we take the ODE from GeometricProblems
and integrate it with GeometricIntegrators
[2]:
import GeometricProblems.HarmonicOscillator as ho
using GeometricIntegrators: ImplicitMidpoint, integrate
# the problem is the ODE of the harmonic oscillator
ho_problem = ho.hodeproblem(; tspan = 500)
# integrate the system
solution = integrate(ho_problem, ImplicitMidpoint())
We call DataLoader
in order to conveniently handle the data:
dl_raw = DataLoader(solution; suppress_info = true)
We have not yet specified the type and backend that we want to use. We do this now:
# specify the data type and the backend
type = Float16
backend = CPU()
# we can then make a new instance of `DataLoader` with this backend and type.
dl = DataLoader(dl_raw, backend, type)
Next we specify the architectures[1]:
const upscaling_dimension = 2
const nhidden = 1
const activation = tanh
const n_layers = 4 # number of layers for the G-SympNet
const depth = 4 # number of layers in each linear block in the LA-SympNet
# calling G-SympNet architecture
gsympnet = GSympNet(dl; upscaling_dimension = upscaling_dimension,
n_layers = n_layers,
activation = activation)
# calling LA-SympNet architecture
lasympnet = LASympNet(dl; nhidden = nhidden,
activation = activation,
depth = depth)
# initialize the networks
la_nn = NeuralNetwork(lasympnet, backend, type)
g_nn = NeuralNetwork(gsympnet, backend, type)
If we want to obtain information on the number of parameters in a neural network, we can do that with the function parameterlength
. For the LASympNet
:
parameterlength(la_nn.model)
14
And for the GSympNet
:
parameterlength(g_nn.model)
12
We can also specify whether we would like to start with a layer that changes the $q$-component or one that changes the $p$-component. This can be done via the keywords init_upper
for the GSympNet
, and init_upper_linear
and init_upper_act
for the LASympNet
.
We have to define an optimizer which will be used in training of the SympNet. In this example we use Adam:
# set up optimizer; for this we first need to specify the optimization method
opt_method = AdamOptimizer(type)
# we then call the optimizer struct which allocates the cache
la_opt = Optimizer(opt_method, la_nn)
g_opt = Optimizer(opt_method, g_nn)
We can now perform the training of the neural networks:
# determine the batch size (the number of samples in one batch)
const batch_size = 16
batch = Batch(batch_size)
# number of training epochs
const nepochs = 100
# perform training (returns array that contains the total loss for each training step)
g_loss_array = g_opt(g_nn, dl, batch, nepochs; show_progress = false)
la_loss_array = la_opt(la_nn, dl, batch, nepochs; show_progress = false)
We plot the training errors against the epoch (here the $y$-axis is in log-scale):
Now we can make a prediction. We compare the initial data with a prediction starting from the same phase space point using the function GeometricMachineLearning.iterate
:
ics = (q=dl.input.q[:, 1, 1], p=dl.input.p[:, 1, 1])
steps_to_plot = 200
#predictions
la_trajectory = iterate(la_nn, ics; n_points = steps_to_plot)
g_trajectory = iterate(g_nn, ics; n_points = steps_to_plot)
We now plot the result:
We see that GSympNet
outperforms LASympNet
on this problem; the blue line (reference) and the orange line ($G$-SympNet) are in fact almost indistinguishable.
We have actually never observed a scenario in which the $LA$-SympNet can outperform the $G$-SympNet. The $G$-SympNet seems usually trains faster, is more accurate and less sensitive to the chosen hyperparameters and initialization of the weights. They are also more straightforward to interpret. We therefore use the $G$-SympNet as a basis for the linear symplectic transformer.
Comparison with a ResNet
We want to show the advantages of using a SympNet over a standard ResNet that is not symplectic. For this we make a ResNet with a similar size of parameters as the two SympNets have:
resnet = ResNet(dl, n_layers ÷ 2; activation = activation)
rn_nn = NeuralNetwork(resnet, backend, type)
parameterlength(rn_nn)
18
We now train the network $\ldots$
rn_opt = Optimizer(opt_method, rn_nn)
rn_loss_array = rn_opt(rn_nn, dl, batch, nepochs; show_progress = false)
and plot the loss:
And we see that the loss is significantly lower than for the $LA$-SympNet, but slightly higher than for the $G$-SympNet. We can also plot the prediction:
rn_trajectory = iterate(rn_nn, ics; n_points = steps_to_plot)
We see that the ResNet is slowly gaining energy which consitutes unphysical behaviour. If we let this simulation run for even longer, this effect gets more pronounced:
steps_to_plot = 800
#predictions
la_trajectory = iterate(la_nn, ics; n_points = steps_to_plot)
g_trajectory = iterate(g_nn, ics; n_points = steps_to_plot)
rn_trajectory = iterate(rn_nn, ics; n_points = steps_to_plot)
The behavior the ResNet exhibits is characteristic of integration schemes that do not preserve structure: the error in a single time step can be made very small, but for long-time simulations one typically has to consider symplecticity or other properties. Also note that the curves produced by the $LA$-SympNet and the $G$-SympNet are closed (or nearly closed). This is a property of symplectic maps in two dimensions that is preserved by construction [1].