Activation of a layer

Description

This equation describes the calculation behind the activation of a layer in a multi-layer perceptron. It is a generalization of Activation of a neuron in a matrix form.

Symbols Used:

\( k \)	This symbol represents any given integer, \( k \in \htmlClass{sdt-0000000122}{\mathbb{Z}}\).
\( x^\kappa \)	This symbol represents the activations of neurons in the \(\kappa\)-th layer of a multi-layer perceptron.
\( \sigma \)	This symbol represents the activation function. It maps real values to other real values in a non-linear way.
\( \mathbf{W} \)	This symbol represents the matrix containing the weights and biases of a layer in a neural network.

Derivation

First, recall the activation of a single neuron \[\htmlClass{sdt-0000000030}{x^\kappa_i} = \htmlClass{sdt-0000000051}{\sigma}(\htmlClass{sdt-0000000080}{\sum}_{\htmlClass{sdt-0000000011}{j}=1}^{\htmlClass{sdt-0000000044}{L}^{\htmlClass{sdt-0000000015}{k} - 1}}\htmlClass{sdt-0000000066}{\theta}_{\htmlClass{sdt-0000000018}{i} \htmlClass{sdt-0000000011}{j}}^{\htmlClass{sdt-0000000015}{k}} x_{\htmlClass{sdt-0000000018}{i}}^{\htmlClass{sdt-0000000015}{k}-1} + \htmlClass{sdt-0000000066}{\theta}_{\htmlClass{sdt-0000000018}{i} 0}^{\htmlClass{sdt-0000000015}{k}})\]

We can rewrite this in vector form, where \(\htmlClass{sdt-0000000066}{\theta}^{\htmlClass{sdt-0000000015}{k}} \in \mathbb{R}^{\htmlClass{sdt-0000000044}{L}^{\htmlClass{sdt-0000000015}{k}}}\) is the vector of weights and \(\htmlClass{sdt-0000000066}{\theta}_{0}^{\htmlClass{sdt-0000000015}{k}}\) is the bias.

\[\htmlClass{sdt-0000000050}{x^\kappa} = \htmlClass{sdt-0000000051}{\sigma}(\htmlClass{sdt-0000000066}{\theta}^{\htmlClass{sdt-0000000015}{k}} \cdot x^{\htmlClass{sdt-0000000015}{k}- 1} + \htmlClass{sdt-0000000066}{\theta}_{0}^\kappa)\]

The activations of all neurons in the \(\htmlClass{sdt-0000000015}{k}\)-th layer is denoted as \(\htmlClass{sdt-0000000050}{x^\kappa} = (\htmlClass{sdt-0000000050}{x^\kappa}_1,...,\htmlClass{sdt-0000000050}{x^\kappa}_{\htmlClass{sdt-0000000044}{L}^{\htmlClass{sdt-0000000015}{k}}})\).

If we stack the weight vectors for each of the neurons together with the biases we end up with the weight matrix - \( \htmlClass{sdt-0000000059}{\mathbf{W}} \).

Now, since matrix multiplication is essentially a series of vector dot products, we can combine these operations into a matrix form.

\[\htmlClass{sdt-0000000050}{x^\kappa} = \htmlClass{sdt-0000000051}{\sigma}(\htmlClass{sdt-0000000059}{\mathbf{W}}[1;x^{\htmlClass{sdt-0000000015}{k} - 1}]) \enspace .\]

The 1 is concatenated to the activation of a previous layer in order to accommodate the bias. Notice, that if the first column of the weight matrix \(\htmlClass{sdt-0000000059}{\mathbf{W}}\) are the biases, they will always be multiplied with this 1, not with any of the actual activations, which is the whole point of the bias.

Example

Assume that the \(\htmlClass{sdt-0000000015}{k}\)-th layer has 3 inputs, and 2 outputs and uses the Rectified Linear Unit as the activation.

Then we can define the weights for each neuron:

\[ \htmlClass{sdt-0000000066}{\theta}^{\htmlClass{sdt-0000000015}{k}}_1 = \begin{bmatrix} 0.1 & 0.2 & 0.3\end{bmatrix} \\ \htmlClass{sdt-0000000066}{\theta}^{\htmlClass{sdt-0000000015}{k}}_2 = \begin{bmatrix} 0.4 & 0.5 & 0.6\end{bmatrix} \]

As well as the biases of each neuron.

\[ \htmlClass{sdt-0000000066}{\theta}^{\htmlClass{sdt-0000000015}{k}}_{01} = 0.7 \\ \htmlClass{sdt-0000000066}{\theta}^{\htmlClass{sdt-0000000015}{k}}_{02} = 0.8 \]

Then, we can construct the weight matrix.

\[ \htmlClass{sdt-0000000059}{\mathbf{W}} = \begin{bmatrix} \htmlClass{sdt-0000000066}{\theta}^{\htmlClass{sdt-0000000015}{k}}_{01} & \htmlClass{sdt-0000000066}{\theta}^{\htmlClass{sdt-0000000015}{k}}_1 \\ \htmlClass{sdt-0000000066}{\theta}^{\htmlClass{sdt-0000000015}{k}}_{02} & \htmlClass{sdt-0000000066}{\theta}^{\htmlClass{sdt-0000000015}{k}}_2 \\ \end{bmatrix} \]

\[ \htmlClass{sdt-0000000059}{\mathbf{W}} = \begin{bmatrix} 0.7 & 0.1 & 0.2 & 0.3 \\ 0.8 & 0.4 & 0.5 & 0.6 \\ \end{bmatrix} \]

If we denote the output of previous layer as

\[x^{\htmlClass{sdt-0000000015}{k} - 1}= \begin{bmatrix} 0.9 \\ 1.0 \\ 1.1\end{bmatrix},\]

then

\[[1;x^{\kappa - 1}] = \begin{bmatrix} 1 \\ 0.9 \\ 1.0 \\ 1.1\end{bmatrix} \]

which finally allows us to calculate the activation on our layer:

\[ \htmlClass{sdt-0000000050}{x^\kappa} = \htmlClass{sdt-0000000051}{\sigma}(\htmlClass{sdt-0000000059}{\mathbf{W}}[1;x^{\htmlClass{sdt-0000000015}{k} - 1}]) \]

We subsequently substitute all the variables to obtain the result.

\[ \htmlClass{sdt-0000000050}{x^\kappa} = \text{relu}(\htmlClass{sdt-0000000059}{\mathbf{W}}[1;x^{\kappa - 1}]) \]

\[ \htmlClass{sdt-0000000050}{x^\kappa} = \text{relu}( \begin{bmatrix} 0.7 & 0.1 & 0.2 & 0.3 \\ 0.8 & 0.4 & 0.5 & 0.6 \\ \end{bmatrix} \begin{bmatrix} 1 \\ 0.9 \\ 1.0 \\ 1.1\end{bmatrix} ) \]

\[ \htmlClass{sdt-0000000050}{x^\kappa} = \text{relu}( \begin{bmatrix} 1.32 \\ 2.32 \\ \end{bmatrix} ) \]

\[ \htmlClass{sdt-0000000050}{x^\kappa} = \text{max}( \begin{bmatrix} 1.32 \\ 2.32 \\ \end{bmatrix}, \begin{bmatrix} 0 \\ 0 \\ \end{bmatrix} ) \]

\[ \htmlClass{sdt-0000000050}{x^\kappa} = \begin{bmatrix} 1.32 \\ 2.32 \\ \end{bmatrix} \]

Your History

Activation of a layer

Prerequisites

Description

\[\htmlClass{sdt-0000000050}{x^\kappa} = \htmlClass{sdt-0000000051}{\sigma}(\htmlClass{sdt-0000000059}{\mathbf{W}}[1;x^{\htmlClass{sdt-0000000015}{k} - 1}])\]

Symbols Used:

Derivation

Example

References