Arbitrarily Combining Derivatives
SymbolicNeuralNetworks
can compute derivatives of arbitrary order of a neural network. For this we use two struct
s:
Whereas the name Jacobian
is standard for the matrix whose entries consist of all partial derivatives of the output of a function, the name Gradient
is typically not used the way it is done here. Normally a gradient collects all the partial derivatives of a scalar function. In SymbolicNeuralNetworks
the struct
Gradient
performs all partial derivatives of a symbolic array with respect to all the parameters of a neural network. So if we compute the Gradient
of a matrix, then the corresponding routine returns a matrix of neural network parameters, each of which is the standard gradient of a matrix element. So it can be written as:
\[\mathtt{Gradient}\left( \begin{pmatrix} m_{11} & m_{12} & \cdots & m_{1m} \\ m_{21} & m_{22} & \cdots & m_{2m} \\ \vdots & \vdots & \vdots & \vdots \\ m_{n1} & m_{n2} & \cdots & m_{nm} \end{pmatrix} \right) = \begin{pmatrix} \nabla_{\mathbb{P}}m_{11} & \nabla_{\mathbb{P}}m_{12} & \cdots & \nabla_{\mathbb{P}}m_{1m} \\ \nabla_{\mathbb{P}}m_{21} & \nabla_{\mathbb{P}}m_{22} & \cdots & \nabla_{\mathbb{P}}m_{2m} \\ \vdots & \vdots & \vdots & \vdots \\ \nabla_{\mathbb{P}}m_{n1} & \nabla_{\mathbb{P}}m_{n2} & \cdots & \nabla_{\mathbb{P}}m_{nm} \end{pmatrix},\]
where $\mathbb{P}$ are the parameters of the neural network. For computational and consistency reasons each element $\nabla_\mathbb{P}m_{ij}$ are NeuralNetworkParameters
.
Jacobian of a Neural Network
SymbolicNeuralNetworks.Jacobian
differentiates a symbolic expression with respect to the input arguments of a neural network:
using AbstractNeuralNetworks
using SymbolicNeuralNetworks
using SymbolicNeuralNetworks: Jacobian, Gradient, derivative
using Latexify: latexify
c = Chain(Dense(2, 1, tanh; use_bias = false))
nn = SymbolicNeuralNetwork(c)
□ = Jacobian(nn)
# we show the derivative with respect to
derivative(□) |> latexify
\begin{equation}
\left[
\begin{array}{cc}
\mathtt{W\_1}_{1,1} \left( 1 - \tanh^{2}\left( \mathtt{W\_1}_{1,1} \mathtt{sinput}_{1} + \mathtt{W\_1}_{1,2} \mathtt{sinput}_{2} \right) \right) & \mathtt{W\_1}_{1,2} \left( 1 - \tanh^{2}\left( \mathtt{W\_1}_{1,1} \mathtt{sinput}_{1} + \mathtt{W\_1}_{1,2} \mathtt{sinput}_{2} \right) \right) \\
\end{array}
\right]
\end{equation}
Note that the output of nn
is one-dimensional and we use the convention
\[\square_{ij} = [\mathrm{jacobian}_{x}f]_{ij} = \frac{\partial}{\partial{}x_j}f_i,\]
so the output has shape $\mathrm{input\_dim}\times\mathrm{output\_dim} = 1\times2$:
size(derivative(□))
(1, 2)
Gradient of a Neural Network
As described above SymbolicNeuralNetworks.Gradient
differentiates every element of the array-valued output with respect to the neural network parameters:
using SymbolicNeuralNetworks: Gradient
g = Gradient(nn)
derivative(g)[1].L1.W |> latexify
\begin{equation}
\left[
\begin{array}{cc}
\mathtt{sinput}_{1} \left( 1 - \tanh^{2}\left( \mathtt{W\_1}_{1,1} \mathtt{sinput}_{1} + \mathtt{W\_1}_{1,2} \mathtt{sinput}_{2} \right) \right) & \mathtt{sinput}_{2} \left( 1 - \tanh^{2}\left( \mathtt{W\_1}_{1,1} \mathtt{sinput}_{1} + \mathtt{W\_1}_{1,2} \mathtt{sinput}_{2} \right) \right) \\
\end{array}
\right]
\end{equation}
Double Derivatives
We can easily differentiate a neural network twice by using SymbolicNeuralNetworks.Jacobian
and SymbolicNeuralNetworks.Gradient
together. We first use SymbolicNeuralNetworks.Jacobian
to differentiate the network output with respect to its input:
using AbstractNeuralNetworks
using SymbolicNeuralNetworks
using SymbolicNeuralNetworks: Jacobian, Gradient, derivative, params
using Latexify: latexify
c = Chain(Dense(2, 1, tanh))
nn = SymbolicNeuralNetwork(c)
□ = Jacobian(nn)
# we show the derivative with respect to
derivative(□) |> latexify
\begin{equation}
\left[
\begin{array}{cc}
\mathtt{W\_1}_{1,1} \left( 1 - \tanh^{2}\left( \mathtt{W\_2}_{1} + \mathtt{W\_1}_{1,1} \mathtt{sinput}_{1} + \mathtt{W\_1}_{1,2} \mathtt{sinput}_{2} \right) \right) & \mathtt{W\_1}_{1,2} \left( 1 - \tanh^{2}\left( \mathtt{W\_2}_{1} + \mathtt{W\_1}_{1,1} \mathtt{sinput}_{1} + \mathtt{W\_1}_{1,2} \mathtt{sinput}_{2} \right) \right) \\
\end{array}
\right]
\end{equation}
We see that the output is a matrix of size $\mathrm{output\_dim} \times \mathrm{input\_dim}$. We can further compute the gradients of all entries of this matrix with SymbolicNeuralNetworks.Gradient
:
g = Gradient(derivative(□), nn)
So SymbolicNeuralNetworks.Gradient
differentiates every element of the matrix with respect to all neural network parameters. In order to access the gradient of the first element of the neural network with respect to the weight b
in the first layer, we write:
matrix_index = (1, 1)
layer = :L1
weight = :b
derivative(g)[matrix_index...][layer][weight] |> latexify
\begin{equation}
\left[
\begin{array}{c}
- 2 \mathtt{W\_1}_{1,1} \tanh\left( \mathtt{W\_2}_{1} + \mathtt{W\_1}_{1,1} \mathtt{sinput}_{1} + \mathtt{W\_1}_{1,2} \mathtt{sinput}_{2} \right) \left( 1 - \tanh^{2}\left( \mathtt{W\_2}_{1} + \mathtt{W\_1}_{1,1} \mathtt{sinput}_{1} + \mathtt{W\_1}_{1,2} \mathtt{sinput}_{2} \right) \right) \\
\end{array}
\right]
\end{equation}
If we now want to obtain an executable Julia
function we have to use build_nn_function
. We call this function on:
\[x = \begin{pmatrix} 1 \\ 0 \end{pmatrix}, \quad W = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}, \quad b = \begin{bmatrix} 0 \\ 0 \end{bmatrix}\]
built_function = build_nn_function(derivative(g), params(nn), nn.input)
x = [1., 0.]
ps = NeuralNetworkParameters((L1 = (W = [1. 0.; 0. 1.], b = [0., 0.]), ))
built_function(x, ps)[matrix_index...][layer][weight]
1-element Vector{Float64}:
-0.6397000084492246
With SymbolicNeuralNetworks
, the struct
s SymbolicNeuralNetworks.Jacobian
, SymbolicNeuralNetworks.Gradient
and build_nn_function
it is easy to build combinations of derivatives. This is much harder when using Zygote
-based AD.