Proper Orthogonal Decomposition
Proper orthogonal decomposition (POD, [66]) is perhaps the most widely-used technique for data-driven reduced order modeling. POD approximates the reduction and the reconstruction through linear maps. Assume that the big discretized space has dimension $N$ and we try to model the solution manifold with an $n$-dimensional subspace. POD then models the reduction $\mathcal{P}:\mathbb{R}^N\to\mathbb{R}^n$ through a matrix $\in\mathbb{R}^{n\times{}N}$ and the reconstruction $\mathcal{R}:\mathbb{R}^n\to\mathbb{R}^N$ through a matrix $\in\mathbb{R}^{N\times{}n}.$ If we are given a snapshot matrix finding $\mathcal{P}$ and $\mathcal{R}$ amounts to a simple application of singular value decomposition (SVD).
Given a snapshot matrix $M = [u_1, \ldots, u_\mathtt{nts}] \in\mathbb{R}^{N\times\mathtt{nts}},$ where $\mathtt{nts}$ is the number of time steps, the ideal linear subspace that can best approximate the data stored in $M$ are the first $n$ columns of the $V$ matrix in an SVD: $M = VDU^T.$ The problem of finding this subspace can either be phrased as a maximization problem:
\[ \max_{\psi_1, \ldots, \psi_n\in\mathbb{R}^N} \sum_{i = 1}^n \sum_{j = 1}^{\mathtt{nts}}| \langle u_j, \psi_i \rangle_{\mathbb{R}^N} |^2 \text{ s.t. $\langle \psi_i, \psi_j \rangle = \delta_{ij}$ for $1 \leq i$, $j \leq n$,}\]
or as a minimization problem:
\[ \min_{\psi_1, \ldots, \psi_n\in\mathbb{R}^N} \sum_{j = 1}^{\mathtt{nts}} | u_j - \sum_{i = 1}^n \psi_i\langle u_j, u_i \rangle |^2\text{ s.t. $\langle \psi_i, \psi_j \rangle = \delta_{ij}$ for $1 \leq i$, $j \leq n$.}\]
In both these cases we have
\[ [\psi_1, \psi_2, \ldots, \psi_n] = V\mathtt{[1:N, 1:n]},\]
where $V$ is obtained via an SVD of $M$.
A proof of the statement above can be found in e.g. [67]. We can obtain the reduced equations via Galerkin projection:
Consider a full-order model on $\mathbb{R}^N$ described by the vector field ${\hat{u}}'(t) = X(\hat{u}(t))$. For a POD basis the reduced vector field, obtained via Galerkin projection, is:
\[ u'(t) = V^TX(Vu(t)),\]
where we used $\{\tilde{\psi}_i = Ve_i\}_{i = 1,\ldots, n}$ as test functions. $e_i\in\mathbb{R}^n$ is the vector that is zero everywhere except for the $i$-th entry, where it is one.
Proof
If we take as test function $\tilde{\psi}_i = Ve_i$, then we get:
\[ e_i^TV^TX(Vu(t)) \overset{!}{=} e_i^TV^TVu'(t) = e_i^Tu'(t),\]
and since this must be true for every $i = 1, \ldots, n$ we obtain the desired expression for the reduced vector field.
In recent years another approach to model $\mathcal{P}$ and $\mathcal{R}$ has become popular, namely to use neural networks to do so.
Autoencoders
Autoencoders are a popular tool in machine learning to perform data compression [43]. The idea is always to find a low-dimensional representation of high-dimensional data. This is also referred to as learning a feature space. This idea straightforwardly lends itself towards an application in reduced order modeling. In this setting we learn two mappings that are modeled with neural networks:
An autoencoder is a tuple of two mappings $(\mathcal{P}, \mathcal{R})$ called the reduction and the reconstruction:
- The reduction $\mathcal{P}:\mathbb{R}^N\to\mathbb{R}^n$ is modeled with a neural network that maps high-dimensional data to a low-dimensional feature space. This network is also referred to as the encoder and we routinely denote it by $\Psi^\mathrm{enc}_{\theta_1}$ to stress the parameter-dependence on $\theta_1$.
- The reconstruction $\mathcal{R}:\mathbb{R}^n\to\mathbb{R}^N$ is modeled with a neural network that maps inputs from the low-dimensional feature space to the high-dimensional space in which the original data were collected. This network is also referred to as the decoder and we routinely denote it by $\Psi^\mathrm{dec}_{\theta_2}$ to stress the parameter-dependence on $\theta_2$.
During training we optimize the autoencoder for minimizing the projection error.
Unlike in the POD case we have to resort to using neural network optimizers in order to adapt the neural network to the data at hand as opposed to simply using SVD. The use of autoencoders instead of POD is extremely advantageous in the case when we deal with problems that exhibit a slowly-decaying Kolmogorov $n$-width. During training we minimize the projection error.
Note that POD can be seen as a special case of an autoencoder where the encoder and the decoder both consist of only one matrix. If we restrict this matrix to be orthonormal, i.e. optimize on the Stiefel manifold, then the best solution we can obtain is equivalent to applying SVD and finding the POD basis.
The Reduced Equations for the Autoencoder
Equivalently to the POD case, we get a reduced vector field when we reduce with the autoencoder:
Consider a full-order model on $\mathbb{R}^N$ described by the vector field ${\hat{u}}'(t) = X(\hat{u}(t))$. If we reduce with an autoencoder $(\Psi^\mathrm{enc}, \Psi^\mathrm{dec})$ we obtain a reduced vector field via Galerkin projection:
\[ u'(t) = (\nabla\Psi^\mathrm{dec})^TX(\Psi^\mathrm{dec}(u(t))),\]
where we used $\{\tilde{\psi}_i = \Psi^\mathrm{dec}(e_i)\}_{i = 1,\ldots, n}$ as test functions. $e_i\in\mathbb{R}^n$ is the vector that is zero everywhere except for the $i$-th entry, where it is one.
Proof
If we take as test function $\tilde{\psi}_i = (\nabla\Psi^\mathrm{dec})e_i$ we get:
\[ e_i^T(\Psi^\mathrm{dec})^+X(\Psi^\mathrm{dec}(u(t))) \overset{!}{=} e_i^T(\Psi^\mathrm{dec})^+(\nabla\Psi^\mathrm{dec})u'(t) = e_i^Tu'(t),\]
and since this must be true for every $i = 1, \ldots, n$ we obtain the desired expression for the reduced vector field.
Both POD and standard autoencoders suffer from the problem that they completely neglect the structure of the differential equation and the data they are applied to. This can have grave consequences [68–70]. Hamiltonian model order reduction can improve the approximation significantly in these situations.
Library Functions
GeometricMachineLearning.AutoEncoder
— TypeAutoEncoder <: Architecture
The abstract AutoEncoder
type.
An autoencoder [43] is a neural network consisting of an encoder $\Psi^e$ and a decoder $\Psi^d$. In the simplest case they are trained on some data set $\mathcal{D}$ to reduce the following error:
\[||\Psi^d\circ\Psi^e(\mathcal{D}) - \mathcal{D}||,\]
which we call the reconstruction error, projection error or autoencoder error (see the docs for AutoEncoderLoss
) and $||\cdot||$ is some norm.
Implementation
AutoEncoder
is an abstract type. If a custom <:AutoEncoder
architecture is implemented it should have the fields full_dim
, reduced_dim
, n_encoder_blocks
and n_decoder_blocks
.
n_encoder_blocks
and n_decoder_blocks
indicate how often the dimension is changed in the encoder (respectively the decoder).
Further the routines encoder
and decoder
should be extended.
GeometricMachineLearning.Encoder
— TypeEncoder <: Architecture
This is the abstract Encoder
type.
Most often this should not be called directly, but rather through the encoder
function.
Implementation
If a custom <:Encoder
architecture is implemented it should have the fields full_dim
, reduced_dim
and n_encoder_blocks
.
GeometricMachineLearning.Decoder
— TypeDecoder <: Architecture
This is the abstract Decoder
type.
Most often this should not be called directly, but rather through the decoder
function.
Implementation
If a custom <:Decoder
architecture is implemented it should have the fields full_dim
, reduced_dim
and n_decoder_blocks
.
GeometricMachineLearning.UnknownEncoder
— TypeUnknownEncoder(full_dim, reduced_dim, n_encoder_blocks)
Make an instance of UnknownEncoder
.
This should be used if one wants to use an Encoder
that does not have any specific structure.
Examples
We show how to make an encoder from a custom architecture:
using GeometricMachineLearning
using GeometricMachineLearning: UnknownEncoder, params
model = Chain(Dense(5, 3, tanh; use_bias = false), Dense(3, 2, identity; use_bias = false))
nn = NeuralNetwork(UnknownEncoder(5, 2, 2), model, params(NeuralNetwork(model)), CPU())
typeof(nn) <: NeuralNetwork{<:GeometricMachineLearning.Encoder}
# output
true
GeometricMachineLearning.UnknownDecoder
— TypeUnknownDecoder(full_dim, reduced_dim, n_encoder_blocks)
Make an instance of UnknownDecoder
.
This should be used if one wants to use an Decoder
that does not have any specific structure.
An example of using this can be constructed analogously to UnknownDecoder
.
GeometricMachineLearning.encoder
— Functionencoder(nn::NeuralNetwork{<:AutoEncoder})
Obtain the encoder from an AutoEncoder
neural network.
encoder(nn)
Make a neural network of type Encoder
out of an arbitrary neural network.
Implementation
Internally this allocates a new nerual network of type UnknownEncoder
and takes the parameters and the backend from nn
.
GeometricMachineLearning.decoder
— Functiondecoder(nn::NeuralNetwork{<:AutoEncoder})
Obtain the decoder from an AutoEncoder
neural network.
decoder(nn)
Make a neural network of type Decoder
out of an arbitrary neural network.
Implementation
Internally this allocates a new nerual network of type UnknownDecoder
and takes the parameters and the backend from nn
.
References
- [66]
- A. Chatterjee. An introduction to the proper orthogonal decomposition. Current science, 808–817 (2000).