Adjusting the Loss Function

GeometricMachineLearning provides a few standard loss function that are used as defaults for specific neural networks:

If these standard losses do not satisfy the user's needs, it is very easy to implement custom loss functions. We again consider training a SympNet on the data coming from a pendulum:

using GeometricMachineLearning
using GeometricIntegrators: integrate, ImplicitMidpoint
using GeometricProblems.HarmonicOscillator: hodeproblem
import Random
Random.seed!(123)

data = integrate(hodeproblem(; tspan = 100), ImplicitMidpoint()) |> DataLoader

nn = NeuralNetwork(GSympNet(2))

o = Optimizer(AdamOptimizer(), nn)

batch = Batch(32)

n_epochs = 30

loss = FeedForwardLoss()

loss_array = o(nn, data, batch, n_epochs, loss)

print(loss_array[end])
[ Info: You have provided a NamedTuple with keys q and p; the data are matrices. This is interpreted as *symplectic data*.

Progress:   7%|██▊                                      |  ETA: 0:03:23
  TrainingLoss:  0.12488902759671226


Progress: 100%|█████████████████████████████████████████| Time: 0:00:16
  TrainingLoss:  0.0025667289897467697
0.0025667289897467697

And we see that the loss goes down to a very low value. But the user might want to constrain the norm of the network parameters:

using LinearAlgebra: norm

# norm of parameters for single layer
network_parameter_norm(params::NamedTuple) = sum([norm(params[i]) for i in 1:length(params)])
# norm of parameters for entire network
network_parameter_norm(params) = sum([network_parameter_norm(param) for param in params])

network_parameter_norm(nn.params)
4.637679510234861

We now implement a custom loss such that:

\[ \mathrm{loss}_\mathcal{NN}^\mathrm{custom}(\mathrm{input}, \mathrm{output}) = \mathrm{loss}_\mathcal{NN}^\mathrm{feedforward} + \lambda \mathrm{norm}(\mathcal{NN}\mathtt{.params}).\]

struct CustomLoss <: GeometricMachineLearning.NetworkLoss end

function (loss::CustomLoss)(model::Chain, params::Tuple, input::CT, output::CT) where {
                                                            T,
                                                            AT<:AbstractArray{T, 3},
                                                            CT<:@NamedTuple{q::AT, p::AT}
                                                            }
    FeedForwardLoss()(model, params, input, output) + .1 * network_parameter_norm(params)
end

loss = CustomLoss()

nn_custom = NeuralNetwork(GSympNet(2))

loss_array = o(nn_custom, data, batch, n_epochs, loss)

print(loss_array[end])

Progress:   7%|██▊                                      |  ETA: 0:00:29
  TrainingLoss:  1.6860468728994746


Progress:  20%|████████▎                                |  ETA: 0:00:09
  TrainingLoss:  0.9460796213071907


Progress:  33%|█████████████▋                           |  ETA: 0:00:05
  TrainingLoss:  0.6397170824326305


Progress:  47%|███████████████████▏                     |  ETA: 0:00:03
  TrainingLoss:  0.4993670427746995


Progress:  60%|████████████████████████▋                |  ETA: 0:00:02
  TrainingLoss:  0.47484101254591493


Progress:  73%|██████████████████████████████▏          |  ETA: 0:00:01
  TrainingLoss:  0.443575121167243


Progress:  87%|███████████████████████████████████▌     |  ETA: 0:00:00
  TrainingLoss:  0.40705628962541285


Progress: 100%|█████████████████████████████████████████| Time: 0:00:02
  TrainingLoss:  0.36431485858976587
0.36431485858976587

And we see that the norm of the parameters is a lot lower:

network_parameter_norm(nn_custom.params)
3.5511322877258724

We can also compare the solutions of the two networks:

using CairoMakie

fig = Figure(; backgroundcolor = :transparent)
ax = Axis(fig[1, 1]; backgroundcolor = :transparent,
    bottomspinecolor = textcolor,
    topspinecolor = textcolor,
    leftspinecolor = textcolor,
    rightspinecolor = textcolor,
    xtickcolor = textcolor,
    ytickcolor = textcolor)

init_con = [0.5 0.]
n_time_steps = 100
prediction1 = zeros(2, n_time_steps + 1)
prediction2 = zeros(2, n_time_steps + 1)
prediction1[:, 1] = init_con
prediction2[:, 1] = init_con

for i in 2:(n_time_steps + 1)
    prediction1[:, i] = nn(prediction1[:, i - 1])
    prediction2[:, i] = nn_custom(prediction2[:, i - 1])
end

lines!(ax, data.input.q[:], data.input.p[:], label = rich("Training Data"; color = textcolor))
lines!(ax, prediction1[1, :], prediction1[2, :], label = rich("FeedForwardLoss"; color = textcolor))
lines!(ax, prediction2[1, :], prediction2[2, :], label = rich("CustomLoss"; color = textcolor))

fig

Library Functions

GeometricMachineLearning.NetworkLossType

An abstract type for all the neural network losses. If you want to implement CustomLoss <: NetworkLoss you need to define a functor:

    (loss::CustomLoss)(model, ps, input, output)

where model is an instance of an AbstractExplicitLayer or a Chain and ps the parameters.

source
GeometricMachineLearning.AutoEncoderLossType

This loss should always be used together with a neural network of type AutoEncoder (and it is also the default for training such a network).

It simply computes:

\[\mathtt{AutoEncoderLoss}(nn\mathtt{::Loss}, input) = ||nn(input) - input||.\]

source
GeometricMachineLearning.TransformerLossType
TransformerLoss(seq_length, prediction_window)

Make an instance of the transformer loss.

This is the loss for a transformer network (especially a transformer integrator).

Parameters

The prediction_window specifies how many time steps are predicted into the future. It defaults to the value specified for seq_length.

source