Mean Squared Error Loss (MSE)

Description

The Mean Squared Error (MSE) loss is an adaptation of the Quadratic Loss (L2) taking into account the used number of input-output pairs.

Symbols Used:

\( y \)	This symbol stands for the ground truth of a sample. In supervised learning this is often paired with the corresponding input.
\( L \)	This is the symbol for a loss function. It is a function that calculates how wrong a model's inference is compared to where it should be.
\( h \)	This symbol denotes a model in machine learning.
\( u \)	This symbol denotes the input of a model.

Derivation

MSE is a loss function, so it takes the form:

\[\htmlClass{sdt-0000000072}{L} : \htmlClass{sdt-0000000045}{\mathbb{R}}^{\htmlClass{sdt-0000000117}{n}} \times \htmlClass{sdt-0000000045}{\mathbb{R}}^{\htmlClass{sdt-0000000117}{n}} \rightarrow \htmlClass{sdt-0000000045}{\mathbb{R}}_{\geq 0}\]

The MSE loss formulation is the L2 loss normalized with respect to the size of the data set, with the mean often being more suggestive than a normal sum of errors.

Consider the definition of the ground truth:
The symbol \(y\) represents the ground truth in a sample in machine learning. Samples come in pairs with the input and the ground truth or "target output"
Now consider the definition of a model prediction:
The symbol for a model is \(h\). It represents a machine learning model that takes an input and gives an output.

given the input:
The symbol \(u\) represents the input of a model.
Consider the elements of the model prediction \( \htmlClass{sdt-0000000084}{h}(\htmlClass{sdt-0000000103}{u}) \) and \( \htmlClass{sdt-0000000037}{y} \):
\[ \htmlClass{sdt-0000000084}{h}(\htmlClass{sdt-0000000103}{u}) = \begin{bmatrix} \htmlClass{sdt-0000000103}{u}_1\\ \htmlClass{sdt-0000000103}{u}_2\\ ...\\ \htmlClass{sdt-0000000103}{u}_N \end{bmatrix} \qquad \htmlClass{sdt-0000000037}{y} = \begin{bmatrix} \htmlClass{sdt-0000000037}{y}_1\\ \htmlClass{sdt-0000000037}{y}_2\\ ...\\ \htmlClass{sdt-0000000037}{y}_N \end{bmatrix} \]
The Quadratic Loss is then:
\[ \begin{align*} \htmlClass{sdt-0000000072}{L} &= \Vert \htmlClass{sdt-0000000084}{h}(\htmlClass{sdt-0000000103}{u}) - \htmlClass{sdt-0000000037}{y} \Vert^2 \\ &= \sum_{i=1}^{N} (\htmlClass{sdt-0000000084}{h}(\htmlClass{sdt-0000000103}{u}_i) - \htmlClass{sdt-0000000037}{y}_i)^2 \end{align*} \]
Dividing by the number of samples gives the MSE:
\[ \htmlClass{sdt-0000000072}{L}_\text{MSE} = \frac{1}{N} \sum_{i=1}^{N} (\htmlClass{sdt-0000000084}{h}(\htmlClass{sdt-0000000103}{u}_i) - \htmlClass{sdt-0000000037}{y}_i)^2 \]
as required.

Example

Assume we want to fit a quadratic polynomial to the values \( \htmlClass{sdt-0000000037}{y} = (1, 0, 2) \) generated from the parabola \( y = 0 + \frac{1}{2} x + \frac{3}{2} x^2 \).

We choose a model \( \htmlClass{sdt-0000000084}{h} \) in the form of a quadratic polynomial: \( \htmlClass{sdt-0000000084}{h}(\htmlClass{sdt-0000000103}{u}_i) = a_0 + a_1 \htmlClass{sdt-0000000103}{u}_i + a_2 \htmlClass{sdt-0000000103}{u}_i^2 \) with unknown coefficients \( a_0, a_1, a_2 \).

Now consider the inputs \( \htmlClass{sdt-0000000103}{u} = (-1, 0, 1) \) with model predictions \( \htmlClass{sdt-0000000084}{h}(\htmlClass{sdt-0000000103}{u}) = (0, 1, 4) \).

The MSE loss is:

\[ \begin{align*} \htmlClass{sdt-0000000072}{L}_\text{MSE} &= \frac{1}{N} \sum_{i=1}^{N} (\htmlClass{sdt-0000000084}{h}(\htmlClass{sdt-0000000103}{u}_i) - \htmlClass{sdt-0000000037}{y}_i)^2 \\ &= \frac{1}{3} \left[ (0 - 1)^2 + (1 - 0)^2 + (4 - 2)^2 \right] \\ &= \frac{1}{3}(1 + 1 +4) \\ &= \frac{12}{3} = 4 \end{align*} \]

On the other hand, the L2 loss is \(12\).

Your History

Mean Squared Error Loss (MSE)

Prerequisites

Description

\[\htmlClass{sdt-0000000072}{L}_\text{MSE}(\htmlClass{sdt-0000000084}{h}(\htmlClass{sdt-0000000103}{u}), \htmlClass{sdt-0000000037}{y}) = \frac{1}{N} \sum_{i=1}^{N} (\htmlClass{sdt-0000000084}{h}(\htmlClass{sdt-0000000103}{u}_i) - \htmlClass{sdt-0000000037}{y}_i)^2\]

Symbols Used:

Derivation

Example

References