Description
Recurrent neural networks (RNNs) are a specialized class of artificial neural networks designed to excel at processing sequential data. Unlike traditional feedforward networks where inputs are treated independently, RNNs possess a unique form of internal memory. This memory, represented by their hidden state, allows them to maintain and update information about previous elements in a sequence. The below equations summarize the behaviour of an RNN.
\[\begin{align*}
\htmlClass{sdt-0000000094}{\mathcal{x}}(n) &= \htmlClass{sdt-0000000079}{\sigma} (\htmlClass{sdt-0000000059}{\mathbf{W}} \htmlClass{sdt-0000000094}{\mathcal{x}}(n-1) + \htmlClass{sdt-0000000059}{\mathbf{W}}^{in} \htmlClass{sdt-0000000103}{u}(n) + \htmlClass{sdt-0000000082}{\mathcal{b}}) \\
\htmlClass{sdt-0000000068}{\mathbf{y}}(n) &= f(\htmlClass{sdt-0000000059}{\mathbf{W}}^{out} \htmlClass{sdt-0000000094}{\mathcal{x}}(n))
\end{align*}\]
Derivation
First, let's consider how the activations of an RNN are computed at each time step:
- Consider how the activations are calculated for a regular feedforward network:
\[\htmlClass{sdt-0000000050}{x^\kappa} = \htmlClass{sdt-0000000051}{\sigma}(\htmlClass{sdt-0000000059}{\mathbf{W}}[1;x^{\htmlClass{sdt-0000000015}{k} - 1}])\] - We will use a general form of this equation which explicitly mentions the bias \( \htmlClass{sdt-0000000082}{\mathcal{b}} \). We will also use the weight matrix \(\htmlClass{sdt-0000000059}{\mathbf{W}}^{in}\) to convert the \( \htmlClass{sdt-0000000109}{K} \) dimensional input to \( \htmlClass{sdt-0000000119}{L} \) dimensions for the number of neurons. Thus, we obtain for the activation of a feedforward network.
\[ \htmlClass{sdt-0000000059}{\mathbf{W}}^{in} \htmlClass{sdt-0000000103}{u}(n) + \htmlClass{sdt-0000000082}{\mathcal{b}}\] - A RNN, utilizes the previous activations inorder to efficiently process sequential data. We can express the activations of the network at the previous time step using \(\htmlClass{sdt-0000000094}{\mathcal{x}}(n-1)\). These activations will be multiplied by an \(\htmlClass{sdt-0000000119}{L} \times \htmlClass{sdt-0000000119}{L}\) weight matrix \( \htmlClass{sdt-0000000059}{\mathbf{W}} \).
- Adding the previous activations to the processed input, we get:
\[\htmlClass{sdt-0000000094}{\mathcal{x}}(n) = \htmlClass{sdt-0000000079}{\sigma} (\htmlClass{sdt-0000000059}{\mathbf{W}} \htmlClass{sdt-0000000094}{\mathcal{x}}(n-1) + \htmlClass{sdt-0000000059}{\mathbf{W}}^{in} \htmlClass{sdt-0000000103}{u}(n) + \htmlClass{sdt-0000000082}{\mathcal{b}})\]
Now let's consider the output of an RNN.
- As highlighted above, the activations of every layer in the RNN can be summarized by \(\htmlClass{sdt-0000000094}{\mathcal{x}}(n).\)
- Consider the below equation for calculating the output of a layer:
\[\htmlClass{sdt-0000000068}{\mathbf{y}}=\htmlClass{sdt-0000000094}{\mathcal{x}}^{\htmlClass{sdt-0000000015}{k}}=\htmlClass{sdt-0000000059}{\mathbf{W}}^{\htmlClass{sdt-0000000015}{k}}\htmlClass{sdt-0000000094}{\mathcal{x}}^{\htmlClass{sdt-0000000015}{k}-1}\]
To calculate the output \( \htmlClass{sdt-0000000068}{\mathbf{y}} \) of an RNN, we need to multiply the activations by the \(\htmlClass{sdt-0000000009}{M} \times \htmlClass{sdt-0000000119}{L}\) output matrix \(\htmlClass{sdt-0000000059}{\mathbf{W}}^{out}\). This is used to map the \( \htmlClass{sdt-0000000119}{L} \) activations to the \( \htmlClass{sdt-0000000009}{M} \) dimensional outputs. - Next, we can apply an arbitrary activation function \(f\). Thus we obtain:
\[\htmlClass{sdt-0000000068}{\mathbf{y}}(n) = f(\htmlClass{sdt-0000000059}{\mathbf{W}}^{out} \htmlClass{sdt-0000000094}{\mathcal{x}}(n))\]
From the above derivations, we have arrived at the two update equations as required.