Proper Orthogonal Decomposition

Proper orthogonal decomposition (POD, [66]) is perhaps the most widely-used technique for data-driven reduced order modeling. POD approximates the reduction and the reconstruction through linear maps. Assume that the big discretized space has dimension $N$ and we try to model the solution manifold with an $n$-dimensional subspace. POD then models the reduction $\mathcal{P}:\mathbb{R}^N\to\mathbb{R}^n$ through a matrix $\in\mathbb{R}^{n\times{}N}$ and the reconstruction $\mathcal{R}:\mathbb{R}^n\to\mathbb{R}^N$ through a matrix $\in\mathbb{R}^{N\times{}n}.$ If we are given a snapshot matrix finding $\mathcal{P}$ and $\mathcal{R}$ amounts to a simple application of singular value decomposition (SVD).

Theorem

Given a snapshot matrix $M = [u_1, \ldots, u_\mathtt{nts}] \in\mathbb{R}^{N\times\mathtt{nts}},$ where $\mathtt{nts}$ is the number of time steps, the ideal linear subspace that can best approximate the data stored in $M$ are the first $n$ columns of the $V$ matrix in an SVD: $M = VDU^T.$ The problem of finding this subspace can either be phrased as a maximization problem:

\[ \max_{\psi_1, \ldots, \psi_n\in\mathbb{R}^N} \sum_{i = 1}^n \sum_{j = 1}^{\mathtt{nts}}| \langle u_j, \psi_i \rangle_{\mathbb{R}^N} |^2 \text{ s.t. $\langle \psi_i, \psi_j \rangle = \delta_{ij}$ for $1 \leq i$, $j \leq n$,}\]

or as a minimization problem:

\[ \min_{\psi_1, \ldots, \psi_n\in\mathbb{R}^N} \sum_{j = 1}^{\mathtt{nts}} | u_j - \sum_{i = 1}^n \psi_i\langle u_j, u_i \rangle |^2\text{ s.t. $\langle \psi_i, \psi_j \rangle = \delta_{ij}$ for $1 \leq i$, $j \leq n$.}\]

In both these cases we have

\[ [\psi_1, \psi_2, \ldots, \psi_n] = V\mathtt{[1:N, 1:n]},\]

where $V$ is obtained via an SVD of $M$.

A proof of the statement above can be found in e.g. [67]. We can obtain the reduced equations via Galerkin projection:

Theorem

Consider a full-order model on $\mathbb{R}^N$ described by the vector field ${\hat{u}}'(t) = X(\hat{u}(t))$. For a POD basis the reduced vector field, obtained via Galerkin projection, is:

\[ u'(t) = V^TX(Vu(t)),\]

where we used $\{\tilde{\psi}_i = Ve_i\}_{i = 1,\ldots, n}$ as test functions. $e_i\in\mathbb{R}^n$ is the vector that is zero everywhere except for the $i$-th entry, where it is one.

Proof

If we take as test function $\tilde{\psi}_i = Ve_i$, then we get:

\[ e_i^TV^TX(Vu(t)) \overset{!}{=} e_i^TV^TVu'(t) = e_i^Tu'(t),\]

and since this must be true for every $i = 1, \ldots, n$ we obtain the desired expression for the reduced vector field.

In recent years another approach to model $\mathcal{P}$ and $\mathcal{R}$ has become popular, namely to use neural networks to do so.

Autoencoders

Autoencoders are a popular tool in machine learning to perform data compression [43]. The idea is always to find a low-dimensional representation of high-dimensional data. This is also referred to as learning a feature space. This idea straightforwardly lends itself towards an application in reduced order modeling. In this setting we learn two mappings that are modeled with neural networks:

Definition

An autoencoder is a tuple of two mappings $(\mathcal{P}, \mathcal{R})$ called the reduction and the reconstruction:

The reduction $\mathcal{P}:\mathbb{R}^N\to\mathbb{R}^n$ is modeled with a neural network that maps high-dimensional data to a low-dimensional feature space. This network is also referred to as the encoder and we routinely denote it by $\Psi^\mathrm{enc}_{\theta_1}$ to stress the parameter-dependence on $\theta_1$.
The reconstruction $\mathcal{R}:\mathbb{R}^n\to\mathbb{R}^N$ is modeled with a neural network that maps inputs from the low-dimensional feature space to the high-dimensional space in which the original data were collected. This network is also referred to as the decoder and we routinely denote it by $\Psi^\mathrm{dec}_{\theta_2}$ to stress the parameter-dependence on $\theta_2$.

During training we optimize the autoencoder for minimizing the projection error.

Unlike in the POD case we have to resort to using neural network optimizers in order to adapt the neural network to the data at hand as opposed to simply using SVD. The use of autoencoders instead of POD is extremely advantageous in the case when we deal with problems that exhibit a slowly-decaying Kolmogorov $n$-width. During training we minimize the projection error.

Remark

Note that POD can be seen as a special case of an autoencoder where the encoder and the decoder both consist of only one matrix. If we restrict this matrix to be orthonormal, i.e. optimize on the Stiefel manifold, then the best solution we can obtain is equivalent to applying SVD and finding the POD basis.

The Reduced Equations for the Autoencoder

Equivalently to the POD case, we get a reduced vector field when we reduce with the autoencoder:

Theorem

Consider a full-order model on $\mathbb{R}^N$ described by the vector field ${\hat{u}}'(t) = X(\hat{u}(t))$. If we reduce with an autoencoder $(\Psi^\mathrm{enc}, \Psi^\mathrm{dec})$ we obtain a reduced vector field via Galerkin projection:

\[ u'(t) = (\nabla\Psi^\mathrm{dec})^TX(\Psi^\mathrm{dec}(u(t))),\]

where we used $\{\tilde{\psi}_i = \Psi^\mathrm{dec}(e_i)\}_{i = 1,\ldots, n}$ as test functions. $e_i\in\mathbb{R}^n$ is the vector that is zero everywhere except for the $i$-th entry, where it is one.

Proof

If we take as test function $\tilde{\psi}_i = (\nabla\Psi^\mathrm{dec})e_i$ we get:

\[ e_i^T(\Psi^\mathrm{dec})^+X(\Psi^\mathrm{dec}(u(t))) \overset{!}{=} e_i^T(\Psi^\mathrm{dec})^+(\nabla\Psi^\mathrm{dec})u'(t) = e_i^Tu'(t),\]

and since this must be true for every $i = 1, \ldots, n$ we obtain the desired expression for the reduced vector field.

Both POD and standard autoencoders suffer from the problem that they completely neglect the structure of the differential equation and the data they are applied to. This can have grave consequences [68–70]. Hamiltonian model order reduction can improve the approximation significantly in these situations.

Library Functions

GeometricMachineLearning.AutoEncoder — Type

AutoEncoder <: Architecture

The abstract AutoEncoder type.

An autoencoder [43] is a neural network consisting of an encoder $\Psi^e$ and a decoder $\Psi^d$. In the simplest case they are trained on some data set $\mathcal{D}$ to reduce the following error:

\[||\Psi^d\circ\Psi^e(\mathcal{D}) - \mathcal{D}||,\]

which we call the reconstruction error, projection error or autoencoder error (see the docs for AutoEncoderLoss) and $||\cdot||$ is some norm.

Implementation

AutoEncoder is an abstract type. If a custom <:AutoEncoder architecture is implemented it should have the fields full_dim, reduced_dim, n_encoder_blocks and n_decoder_blocks.

n_encoder_blocks and n_decoder_blocks indicate how often the dimension is changed in the encoder (respectively the decoder).

Further the routines encoder and decoder should be extended.

source

GeometricMachineLearning.Encoder — Type

Encoder <: Architecture

This is the abstract Encoder type.

Most often this should not be called directly, but rather through the encoder function.

Implementation

If a custom <:Encoder architecture is implemented it should have the fields full_dim, reduced_dim and n_encoder_blocks.

source

GeometricMachineLearning.Decoder — Type

Decoder <: Architecture

This is the abstract Decoder type.

Most often this should not be called directly, but rather through the decoder function.

Implementation

If a custom <:Decoder architecture is implemented it should have the fields full_dim, reduced_dim and n_decoder_blocks.

source

GeometricMachineLearning.UnknownEncoder — Type

UnknownEncoder(full_dim, reduced_dim, n_encoder_blocks)

Make an instance of UnknownEncoder.

This should be used if one wants to use an Encoder that does not have any specific structure.

Examples

We show how to make an encoder from a custom architecture:

using GeometricMachineLearning
using GeometricMachineLearning: UnknownEncoder, params

model = Chain(Dense(5, 3, tanh; use_bias = false), Dense(3, 2, identity; use_bias = false))
nn = NeuralNetwork(UnknownEncoder(5, 2, 2), model, params(NeuralNetwork(model)), CPU())

typeof(nn) <: NeuralNetwork{<:GeometricMachineLearning.Encoder}

# output

true

source

GeometricMachineLearning.UnknownDecoder — Type

UnknownDecoder(full_dim, reduced_dim, n_encoder_blocks)

Make an instance of UnknownDecoder.

This should be used if one wants to use an Decoder that does not have any specific structure.

An example of using this can be constructed analogously to UnknownDecoder.

source

GeometricMachineLearning.encoder — Function

encoder(nn::NeuralNetwork{<:AutoEncoder})

Obtain the encoder from an AutoEncoder neural network.

source

encoder(nn)

Make a neural network of type Encoder out of an arbitrary neural network.

Implementation

Internally this allocates a new nerual network of type UnknownEncoder and takes the parameters and the backend from nn.

source

GeometricMachineLearning.decoder — Function

decoder(nn::NeuralNetwork{<:AutoEncoder})

Obtain the decoder from an AutoEncoder neural network.

source

decoder(nn)

Make a neural network of type Decoder out of an arbitrary neural network.

Implementation

Internally this allocates a new nerual network of type UnknownDecoder and takes the parameters and the backend from nn.

source

References

[66]: A. Chatterjee. An introduction to the proper orthogonal decomposition. Current science, 808–817 (2000).