This equation describes the calculation behind the activation of a layer in a multi-layer perceptron. It is a generalization of Activation of a neuron in a matrix form.
\( k \) | This symbol represents any given integer, \( k \in \htmlClass{sdt-0000000122}{\mathbb{Z}}\). |
\( x^\kappa \) | This symbol represents the activations of neurons in the \(\kappa\)-th layer of a multi-layer perceptron. |
\( \sigma \) | This symbol represents the activation function. It maps real values to other real values in a non-linear way. |
\( \mathbf{W} \) | This symbol represents the matrix containing the weights and biases of a layer in a neural network. |
First, recall the activation of a single neuron \[\htmlClass{sdt-0000000030}{x^\kappa_i} = \htmlClass{sdt-0000000051}{\sigma}(\htmlClass{sdt-0000000080}{\sum}_{\htmlClass{sdt-0000000011}{j}=1}^{\htmlClass{sdt-0000000044}{L}^{\htmlClass{sdt-0000000015}{k} - 1}}\htmlClass{sdt-0000000066}{\theta}_{\htmlClass{sdt-0000000018}{i} \htmlClass{sdt-0000000011}{j}}^{\htmlClass{sdt-0000000015}{k}} x_{\htmlClass{sdt-0000000018}{i}}^{\htmlClass{sdt-0000000015}{k}-1} + \htmlClass{sdt-0000000066}{\theta}_{\htmlClass{sdt-0000000018}{i} 0}^{\htmlClass{sdt-0000000015}{k}})\]
We can rewrite this in vector form, where \(\htmlClass{sdt-0000000066}{\theta}^{\htmlClass{sdt-0000000015}{k}} \in \mathbb{R}^{\htmlClass{sdt-0000000044}{L}^{\htmlClass{sdt-0000000015}{k}}}\) is the vector of weights and \(\htmlClass{sdt-0000000066}{\theta}_{0}^{\htmlClass{sdt-0000000015}{k}}\) is the bias.
\[\htmlClass{sdt-0000000050}{x^\kappa} = \htmlClass{sdt-0000000051}{\sigma}(\htmlClass{sdt-0000000066}{\theta}^{\htmlClass{sdt-0000000015}{k}} \cdot x^{\htmlClass{sdt-0000000015}{k}- 1} + \htmlClass{sdt-0000000066}{\theta}_{0}^\kappa)\]
The activations of all neurons in the \(\htmlClass{sdt-0000000015}{k}\)-th layer is denoted as \(\htmlClass{sdt-0000000050}{x^\kappa} = (\htmlClass{sdt-0000000050}{x^\kappa}_1,...,\htmlClass{sdt-0000000050}{x^\kappa}_{\htmlClass{sdt-0000000044}{L}^{\htmlClass{sdt-0000000015}{k}}})\).
If we stack the weight vectors for each of the neurons together with the biases we end up with the weight matrix - \( \htmlClass{sdt-0000000059}{\mathbf{W}} \).
Now, since matrix multiplication is essentially a series of vector dot products, we can combine these operations into a matrix form.
\[\htmlClass{sdt-0000000050}{x^\kappa} = \htmlClass{sdt-0000000051}{\sigma}(\htmlClass{sdt-0000000059}{\mathbf{W}}[1;x^{\htmlClass{sdt-0000000015}{k} - 1}]) \enspace .\]
The 1 is concatenated to the activation of a previous layer in order to accommodate the bias. Notice, that if the first column of the weight matrix \(\htmlClass{sdt-0000000059}{\mathbf{W}}\) are the biases, they will always be multiplied with this 1, not with any of the actual activations, which is the whole point of the bias.
Assume that the \(\htmlClass{sdt-0000000015}{k}\)-th layer has 3 inputs, and 2 outputs and uses the Rectified Linear Unit as the activation.
Then we can define the weights for each neuron:
\[ \htmlClass{sdt-0000000066}{\theta}^{\htmlClass{sdt-0000000015}{k}}_1 = \begin{bmatrix} 0.1 & 0.2 & 0.3\end{bmatrix} \\ \htmlClass{sdt-0000000066}{\theta}^{\htmlClass{sdt-0000000015}{k}}_2 = \begin{bmatrix} 0.4 & 0.5 & 0.6\end{bmatrix} \]
As well as the biases of each neuron.
\[ \htmlClass{sdt-0000000066}{\theta}^{\htmlClass{sdt-0000000015}{k}}_{01} = 0.7 \\ \htmlClass{sdt-0000000066}{\theta}^{\htmlClass{sdt-0000000015}{k}}_{02} = 0.8 \]
Then, we can construct the weight matrix.
\[ \htmlClass{sdt-0000000059}{\mathbf{W}} = \begin{bmatrix} \htmlClass{sdt-0000000066}{\theta}^{\htmlClass{sdt-0000000015}{k}}_{01} & \htmlClass{sdt-0000000066}{\theta}^{\htmlClass{sdt-0000000015}{k}}_1 \\ \htmlClass{sdt-0000000066}{\theta}^{\htmlClass{sdt-0000000015}{k}}_{02} & \htmlClass{sdt-0000000066}{\theta}^{\htmlClass{sdt-0000000015}{k}}_2 \\ \end{bmatrix} \]
\[ \htmlClass{sdt-0000000059}{\mathbf{W}} = \begin{bmatrix} 0.7 & 0.1 & 0.2 & 0.3 \\ 0.8 & 0.4 & 0.5 & 0.6 \\ \end{bmatrix} \]
If we denote the output of previous layer as
\[x^{\htmlClass{sdt-0000000015}{k} - 1}= \begin{bmatrix} 0.9 \\ 1.0 \\ 1.1\end{bmatrix},\]
then
\[[1;x^{\kappa - 1}] = \begin{bmatrix} 1 \\ 0.9 \\ 1.0 \\ 1.1\end{bmatrix} \]
which finally allows us to calculate the activation on our layer:
\[ \htmlClass{sdt-0000000050}{x^\kappa} = \htmlClass{sdt-0000000051}{\sigma}(\htmlClass{sdt-0000000059}{\mathbf{W}}[1;x^{\htmlClass{sdt-0000000015}{k} - 1}]) \]
We subsequently substitute all the variables to obtain the result.
\[ \htmlClass{sdt-0000000050}{x^\kappa} = \text{relu}(\htmlClass{sdt-0000000059}{\mathbf{W}}[1;x^{\kappa - 1}]) \]
\[ \htmlClass{sdt-0000000050}{x^\kappa} = \text{relu}( \begin{bmatrix} 0.7 & 0.1 & 0.2 & 0.3 \\ 0.8 & 0.4 & 0.5 & 0.6 \\ \end{bmatrix} \begin{bmatrix} 1 \\ 0.9 \\ 1.0 \\ 1.1\end{bmatrix} ) \]
\[ \htmlClass{sdt-0000000050}{x^\kappa} = \text{relu}( \begin{bmatrix} 1.32 \\ 2.32 \\ \end{bmatrix} ) \]
\[ \htmlClass{sdt-0000000050}{x^\kappa} = \text{max}( \begin{bmatrix} 1.32 \\ 2.32 \\ \end{bmatrix}, \begin{bmatrix} 0 \\ 0 \\ \end{bmatrix} ) \]
\[ \htmlClass{sdt-0000000050}{x^\kappa} = \begin{bmatrix} 1.32 \\ 2.32 \\ \end{bmatrix} \]