Your History

Menu

Mean Squared Error Loss (MSE)

Prerequisites

General Form of a Loss Function | \(L : \mathbb{R}^{n} \times \mathbb{R}^{n} \rightarrow \mathbb{R}_{\geq 0}\)
Ground Truth | \( y \)
Model | \( h \)
Input | \( u \)

Description

The Mean Squared Error (MSE) loss is an adaptation of the Quadratic Loss (L2) taking into account the used number of input-output pairs.

\[\htmlClass{sdt-0000000072}{L}_\text{MSE}(\htmlClass{sdt-0000000084}{h}(\htmlClass{sdt-0000000103}{u}), \htmlClass{sdt-0000000037}{y}) = \frac{1}{N} \sum_{i=1}^{N} (\htmlClass{sdt-0000000084}{h}(\htmlClass{sdt-0000000103}{u}_i) - \htmlClass{sdt-0000000037}{y}_i)^2\]

Symbols Used:

This is the symbol for a loss function. It is a function that calculates how wrong a model's inference is compared to where it should be.

\( h \)

This symbol denotes a model in machine learning.

\( y \)

This symbol stands for the ground truth of a sample. In supervised learning this is often paired with the corresponding input.

\( u \)

This symbol denotes the input of a model.

Derivation

MSE is a loss function, so it takes the form:

\[\htmlClass{sdt-0000000072}{L} : \htmlClass{sdt-0000000045}{\mathbb{R}}^{\htmlClass{sdt-0000000117}{n}} \times \htmlClass{sdt-0000000045}{\mathbb{R}}^{\htmlClass{sdt-0000000117}{n}} \rightarrow \htmlClass{sdt-0000000045}{\mathbb{R}}_{\geq 0}\]

The MSE loss formulation is the L2 loss normalized with respect to the size of the data set, with the mean often being more suggestive than a normal sum of errors.

  1. Consider the definition of the ground truth:

    The symbol \(y\) represents the ground truth in a sample in machine learning. Samples come in pairs with the input and the ground truth or "target output"

  2. Now consider the definition of a model prediction:

    The symbol for a model is \(h\). It represents a machine learning model that takes an input and gives an output.


    given the input:

    The symbol \(u\) represents the input of a model.

  3. Consider the elements of the model prediction \( \htmlClass{sdt-0000000084}{h}(\htmlClass{sdt-0000000103}{u}) \) and \( \htmlClass{sdt-0000000037}{y} \):
    \[ \htmlClass{sdt-0000000084}{h}(\htmlClass{sdt-0000000103}{u}) = \begin{bmatrix} \htmlClass{sdt-0000000103}{u}_1\\ \htmlClass{sdt-0000000103}{u}_2\\ ...\\ \htmlClass{sdt-0000000103}{u}_N \end{bmatrix} \qquad \htmlClass{sdt-0000000037}{y} = \begin{bmatrix} \htmlClass{sdt-0000000037}{y}_1\\ \htmlClass{sdt-0000000037}{y}_2\\ ...\\ \htmlClass{sdt-0000000037}{y}_N \end{bmatrix} \]
  4. The Quadratic Loss is then:
    \[ \begin{align*} \htmlClass{sdt-0000000072}{L} &= \Vert \htmlClass{sdt-0000000084}{h}(\htmlClass{sdt-0000000103}{u}) - \htmlClass{sdt-0000000037}{y} \Vert^2 \\ &= \sum_{i=1}^{N} (\htmlClass{sdt-0000000084}{h}(\htmlClass{sdt-0000000103}{u}_i) - \htmlClass{sdt-0000000037}{y}_i)^2 \end{align*} \]
  5. Dividing by the number of samples gives the MSE:
    \[ \htmlClass{sdt-0000000072}{L}_\text{MSE} = \frac{1}{N} \sum_{i=1}^{N} (\htmlClass{sdt-0000000084}{h}(\htmlClass{sdt-0000000103}{u}_i) - \htmlClass{sdt-0000000037}{y}_i)^2 \]
    as required.

Example

Assume we want to fit a quadratic polynomial to the values \( \htmlClass{sdt-0000000037}{y} = (1, 0, 2) \) generated from the parabola \( y = 0 + \frac{1}{2} x + \frac{3}{2} x^2 \).

We choose a model \( \htmlClass{sdt-0000000084}{h} \) in the form of a quadratic polynomial: \( \htmlClass{sdt-0000000084}{h}(\htmlClass{sdt-0000000103}{u}_i) = a_0 + a_1 \htmlClass{sdt-0000000103}{u}_i + a_2 \htmlClass{sdt-0000000103}{u}_i^2 \) with unknown coefficients \( a_0, a_1, a_2 \).

Now consider the inputs \( \htmlClass{sdt-0000000103}{u} = (-1, 0, 1) \) with model predictions \( \htmlClass{sdt-0000000084}{h}(\htmlClass{sdt-0000000103}{u}) = (0, 1, 4) \).

The MSE loss is:

\[ \begin{align*} \htmlClass{sdt-0000000072}{L}_\text{MSE} &= \frac{1}{N} \sum_{i=1}^{N} (\htmlClass{sdt-0000000084}{h}(\htmlClass{sdt-0000000103}{u}_i) - \htmlClass{sdt-0000000037}{y}_i)^2 \\ &= \frac{1}{3} \left[ (0 - 1)^2 + (1 - 0)^2 + (4 - 2)^2 \right] \\ &= \frac{1}{3}(1 + 1 +4) \\ &= \frac{12}{3} = 4 \end{align*} \]

On the other hand, the L2 loss is \(12\).

References

  1. Jaeger, H. (n.d.). Neural Networks (AI) (WBAI028-05) Lecture Notes BSc program in Artificial Intelligence. Retrieved April 20, 2024, from https://www.ai.rug.nl/minds/uploads/LN_NN_RUG.pdf