Instead of an RNN whose activation layer only uses the input and the previous activation state as seen here, it might be wise to feed the previous output into the network. This approach introduces a valuable feedback loop that promotes adaptability, context-awareness, and coherence throughout the modelling process.
\( \mathbf{W} \) | This symbol represents the matrix containing the weights and biases of a layer in a neural network. |
\( \mathbf{y} \) | This symbol represents the output activation vector of a neural network. |
\( \sigma \) | This symbol represents the sigmoid function. |
\( \mathcal{b} \) | This symbol represents the bias of a layer in a neural network. |
\( \mathcal{x} \) | This symbol represents the activations of a neural network layer in vector form. |
\( u \) | This symbol denotes the input of a model. |
Recall the standard RNN update equation:
\[\begin{align*} \htmlClass{sdt-0000000094}{\mathcal{x}}(n) &= \htmlClass{sdt-0000000079}{\sigma} (\htmlClass{sdt-0000000059}{\mathbf{W}} \htmlClass{sdt-0000000094}{\mathcal{x}}(n-1) + \htmlClass{sdt-0000000059}{\mathbf{W}}^{in} \htmlClass{sdt-0000000103}{u}(n) + \htmlClass{sdt-0000000082}{\mathcal{b}}) \\ \htmlClass{sdt-0000000068}{\mathbf{y}}(n) &= f(\htmlClass{sdt-0000000059}{\mathbf{W}}^{out} \htmlClass{sdt-0000000094}{\mathcal{x}}(n)) \end{align*}\]
Let's consider the domain of language modelling. When tasked with generating human-like text, an RNN can greatly benefit from incorporating its previous outputs. By taking the predicted word at a given step and reintroducing it as input for the next step, the model maintains a better "memory" of its generation. This encourages the production of sentences that are not only grammatically sound but also exhibit a natural flow and internal consistency, mirroring the qualities found in human-written text.