Your History

Menu

Recurrent Neural Network with Output Feedback

Prerequisites

Recurrent Neural Network Update Equations | \(\begin{align*} \mathcal{x}(n) &= \sigma (\mathbf{W} \mathcal{x}(n-1) + \mathbf{W}^{in} u(n) + \mathcal{b}) \\ \mathbf{y}(n) &= f(\mathbf{W}^{out} \mathcal{x}(n)) \end{align*}\)

Description

Instead of an RNN whose activation layer only uses the input and the previous activation state as seen here, it might be wise to feed the previous output into the network. This approach introduces a valuable feedback loop that promotes adaptability, context-awareness, and coherence throughout the modelling process.

\[\htmlClass{sdt-0000000094}{\mathcal{x}}(n) = \htmlClass{sdt-0000000079}{\sigma} (\htmlClass{sdt-0000000059}{\mathbf{W}} \htmlClass{sdt-0000000094}{\mathcal{x}}(n-1) + \htmlClass{sdt-0000000059}{\mathbf{W}}^{in} \htmlClass{sdt-0000000103}{u}(n) + \htmlClass{sdt-0000000059}{\mathbf{W}}^{fb} \htmlClass{sdt-0000000068}{\mathbf{y}}(n-1) + \htmlClass{sdt-0000000082}{\mathcal{b}})\]

Symbols Used:

This symbol represents the matrix containing the weights and biases of a layer in a neural network.

\( \mathbf{y} \)

This symbol represents the output activation vector of a neural network.

\( \sigma \)

This symbol represents the sigmoid function.

\( \mathcal{b} \)

This symbol represents the bias of a layer in a neural network.

\( \mathcal{x} \)

This symbol represents the activations of a neural network layer in vector form.

\( u \)

This symbol denotes the input of a model.

Derivation

Recall the standard RNN update equation:

\[\begin{align*} \htmlClass{sdt-0000000094}{\mathcal{x}}(n) &= \htmlClass{sdt-0000000079}{\sigma} (\htmlClass{sdt-0000000059}{\mathbf{W}} \htmlClass{sdt-0000000094}{\mathcal{x}}(n-1) + \htmlClass{sdt-0000000059}{\mathbf{W}}^{in} \htmlClass{sdt-0000000103}{u}(n) + \htmlClass{sdt-0000000082}{\mathcal{b}}) \\ \htmlClass{sdt-0000000068}{\mathbf{y}}(n) &= f(\htmlClass{sdt-0000000059}{\mathbf{W}}^{out} \htmlClass{sdt-0000000094}{\mathcal{x}}(n)) \end{align*}\]

  1. We are only interested in modifying the activation function \(\htmlClass{sdt-0000000094}{\mathcal{x}}(n)\) to use the previous output \(\htmlClass{sdt-0000000068}{\mathbf{y}}(n-1)\).
  2. This can be done by introducing a new weight matrix \(\htmlClass{sdt-0000000059}{\mathbf{W}}^{fb}\) for the feedback process. The purpose of this matrix is to convert the \( \htmlClass{sdt-0000000009}{M} \) dimensional output to the \( \htmlClass{sdt-0000000119}{L} \) dimensions.
  3. Now we can just add the product of the weight matrix and previous output to the activation function. We get:
    \[\htmlClass{sdt-0000000094}{\mathcal{x}}(n) = \htmlClass{sdt-0000000079}{\sigma} (\htmlClass{sdt-0000000059}{\mathbf{W}} \htmlClass{sdt-0000000094}{\mathcal{x}}(n-1) + \htmlClass{sdt-0000000059}{\mathbf{W}}^{in} \htmlClass{sdt-0000000103}{u}(n) + \htmlClass{sdt-0000000059}{\mathbf{W}}^{fb} \htmlClass{sdt-0000000068}{\mathbf{y}}(n-1) + \htmlClass{sdt-0000000082}{\mathcal{b}}) \]
    as required

Example

Let's consider the domain of language modelling. When tasked with generating human-like text, an RNN can greatly benefit from incorporating its previous outputs. By taking the predicted word at a given step and reintroducing it as input for the next step, the model maintains a better "memory" of its generation. This encourages the production of sentences that are not only grammatically sound but also exhibit a natural flow and internal consistency, mirroring the qualities found in human-written text.

References

  1. Jaeger, H. (2024, April 26). Neural Networks (AI) (WBAI028-05) Lecture Notes BSc program in Artificial Intelligence. Retrieved from https://www.ai.rug.nl/minds/uploads/LN_NN_RUG.pdf