Recurrent Neural Network Update Equations

Description

Recurrent neural networks (RNNs) are a specialized class of artificial neural networks designed to excel at processing sequential data. Unlike traditional feedforward networks where inputs are treated independently, RNNs possess a unique form of internal memory. This memory, represented by their hidden state, allows them to maintain and update information about previous elements in a sequence. The below equations summarize the behaviour of an RNN.

Symbols Used:

\( \mathbf{W} \)	This symbol represents the matrix containing the weights and biases of a layer in a neural network.
\( \mathbf{y} \)	This symbol represents the output activation vector of a neural network.
\( \sigma \)	This symbol represents the sigmoid function.
\( \mathcal{b} \)	This symbol represents the bias of a layer in a neural network.
\( \mathcal{x} \)	This symbol represents the activations of a neural network layer in vector form.
\( u \)	This symbol denotes the input of a model.

Derivation

First, let's consider how the activations of an RNN are computed at each time step:

Consider how the activations are calculated for a regular feedforward network:
\[\htmlClass{sdt-0000000050}{x^\kappa} = \htmlClass{sdt-0000000051}{\sigma}(\htmlClass{sdt-0000000059}{\mathbf{W}}[1;x^{\htmlClass{sdt-0000000015}{k} - 1}])\]
We will use a general form of this equation which explicitly mentions the bias \( \htmlClass{sdt-0000000082}{\mathcal{b}} \). We will also use the weight matrix \(\htmlClass{sdt-0000000059}{\mathbf{W}}^{in}\) to convert the \( \htmlClass{sdt-0000000109}{K} \) dimensional input to \( \htmlClass{sdt-0000000119}{L} \) dimensions for the number of neurons. Thus, we obtain for the activation of a feedforward network.
\[ \htmlClass{sdt-0000000059}{\mathbf{W}}^{in} \htmlClass{sdt-0000000103}{u}(n) + \htmlClass{sdt-0000000082}{\mathcal{b}}\]
A RNN, utilizes the previous activations inorder to efficiently process sequential data. We can express the activations of the network at the previous time step using \(\htmlClass{sdt-0000000094}{\mathcal{x}}(n-1)\). These activations will be multiplied by an \(\htmlClass{sdt-0000000119}{L} \times \htmlClass{sdt-0000000119}{L}\) weight matrix \( \htmlClass{sdt-0000000059}{\mathbf{W}} \).
Adding the previous activations to the processed input, we get:
\[\htmlClass{sdt-0000000094}{\mathcal{x}}(n) = \htmlClass{sdt-0000000079}{\sigma} (\htmlClass{sdt-0000000059}{\mathbf{W}} \htmlClass{sdt-0000000094}{\mathcal{x}}(n-1) + \htmlClass{sdt-0000000059}{\mathbf{W}}^{in} \htmlClass{sdt-0000000103}{u}(n) + \htmlClass{sdt-0000000082}{\mathcal{b}})\]

Now let's consider the output of an RNN.

As highlighted above, the activations of every layer in the RNN can be summarized by \(\htmlClass{sdt-0000000094}{\mathcal{x}}(n).\)
Consider the below equation for calculating the output of a layer:
\[\htmlClass{sdt-0000000068}{\mathbf{y}}=\htmlClass{sdt-0000000094}{\mathcal{x}}^{\htmlClass{sdt-0000000015}{k}}=\htmlClass{sdt-0000000059}{\mathbf{W}}^{\htmlClass{sdt-0000000015}{k}}\htmlClass{sdt-0000000094}{\mathcal{x}}^{\htmlClass{sdt-0000000015}{k}-1}\]
To calculate the output \( \htmlClass{sdt-0000000068}{\mathbf{y}} \) of an RNN, we need to multiply the activations by the \(\htmlClass{sdt-0000000009}{M} \times \htmlClass{sdt-0000000119}{L}\) output matrix \(\htmlClass{sdt-0000000059}{\mathbf{W}}^{out}\). This is used to map the \( \htmlClass{sdt-0000000119}{L} \) activations to the \( \htmlClass{sdt-0000000009}{M} \) dimensional outputs.
Next, we can apply an arbitrary activation function \(f\). Thus we obtain:
\[\htmlClass{sdt-0000000068}{\mathbf{y}}(n) = f(\htmlClass{sdt-0000000059}{\mathbf{W}}^{out} \htmlClass{sdt-0000000094}{\mathcal{x}}(n))\]

From the above derivations, we have arrived at the two update equations as required.

Your History

Recurrent Neural Network Update Equations

Prerequisites

Description

Symbols Used:

Derivation

References