Quadratic Loss (L2)

Description

The quadratic loss quantifies the mismatch between the prediction of a model and the target output. It does this as the square of the difference between the prediction and the target output. This has a nice property that the loss is the same independent of whether the prediction was higher or lower than the ground truth. It is one of the most often encountered loss functions in machine learning, and also goes by the name 'L2 Loss'.

Symbols Used:

\( y \)	This symbol stands for the ground truth of a sample. In supervised learning this is often paired with the corresponding input.
\( L \)	This is the symbol for a loss function. It is a function that calculates how wrong a model's inference is compared to where it should be.
\( h \)	This symbol denotes a model in machine learning.
\( u \)	This symbol denotes the input of a model.

Derivation

The quadratic loss is a loss function, and therefore will take the form:

\[\htmlClass{sdt-0000000072}{L} : \htmlClass{sdt-0000000045}{\mathbb{R}}^{\htmlClass{sdt-0000000117}{n}} \times \htmlClass{sdt-0000000045}{\mathbb{R}}^{\htmlClass{sdt-0000000117}{n}} \rightarrow \htmlClass{sdt-0000000045}{\mathbb{R}}_{\geq 0}\]

The idea behind the loss function is that the larger the mismatch between the prediction and target output, the bigger the loss should be.

Consider the definition of the ground truth:
The symbol \(y\) represents the ground truth in a sample in machine learning. Samples come in pairs with the input and the ground truth or "target output"
Consider now the definition of a prediction, the output of a model defined as:
The symbol for a model is \(h\). It represents a machine learning model that takes an input and gives an output.

given an input:
The symbol \(u\) represents the input of a model.
From here it follows that if they match, the model is correct. So the loss would be \(0\). Otherwise the model is incorrect. We choose to square the difference to ensure it is always positive, and to further 'punish' very wrong outputs from the model. Mathematically, this is expressed as:
\[\htmlClass{sdt-0000000072}{L}(\htmlClass{sdt-0000000084}{h}(\htmlClass{sdt-0000000103}{u}), \htmlClass{sdt-0000000037}{y}) = \Vert \htmlClass{sdt-0000000084}{h}(\htmlClass{sdt-0000000103}{u}) - \htmlClass{sdt-0000000037}{y} \Vert ^2\]
as required.

Example

Let \( \htmlClass{sdt-0000000037}{y} \) be some target corresponding to an input, \( \htmlClass{sdt-0000000103}{u} \). Where:

\(\htmlClass{sdt-0000000103}{u} = (1, 2, 3)\)
\(\htmlClass{sdt-0000000037}{y} = (2, 4, 6)\)
\(\htmlClass{sdt-0000000084}{h}(\htmlClass{sdt-0000000103}{u}) = (3, 5, 6)\)

We can now plug them in to the equation to get:
\[\begin{align*}\htmlClass{sdt-0000000072}{L} &= \Vert (3, 5, 6) - (2, 4, 6) \Vert^2\\\htmlClass{sdt-0000000072}{L} &= \Vert (1, 1, 0) \Vert^2\\\htmlClass{sdt-0000000072}{L} &= (\sqrt{1^2 + 1^2 + 0^2})^2\\\htmlClass{sdt-0000000072}{L} &= \sqrt{2}^2 \\\htmlClass{sdt-0000000072}{L} &= 2 \end{align*}\]
Therefore the answer is 2

Your History

Prerequisites

Description

\[\htmlClass{sdt-0000000072}{L}(\htmlClass{sdt-0000000084}{h}(\htmlClass{sdt-0000000103}{u}), \htmlClass{sdt-0000000037}{y}) = \Vert \htmlClass{sdt-0000000084}{h}(\htmlClass{sdt-0000000103}{u}) - \htmlClass{sdt-0000000037}{y} \Vert ^2\]

Symbols Used:

Derivation

Example

References