Your History

Menu

Empirical Risk of a Model

Prerequisites

Risk of a Model | \(R(\hat{f}) = E [L(\hat{f}(U), Y)]\)
Random Variable Input | \( U \)
Random Variable Output | \( Y \)
Expectation | \( E \)

Description

The empirical risk of a model offers a definitive way to calculate the risk \( \htmlClass{sdt-0000000062}{R} \) using the sample mean of inputs and ground truths drawn from random variables. The empirical risk serves as an approximation of the expected risk, which is the true error the model would make over the entire distribution of possible input data. Due to practical limitations, we typically calculate empirical risk on a dataset of samples rather than the complete theoretical distribution.

\[\htmlClass{sdt-0000000062}{R}^{emp}(\htmlClass{sdt-0000000084}{h}) = \frac{1}{N} \sum^{N}_{i=1} \htmlClass{sdt-0000000072}{L} (\htmlClass{sdt-0000000084}{h}(\htmlClass{sdt-0000000103}{u}_i), \htmlClass{sdt-0000000037}{y}_i)\]

Symbols Used:

This symbol stands for the ground truth of a sample. In supervised learning this is often paired with the corresponding input.

\( R \)

This symbol denotes the risk of a model.

\( L \)

This is the symbol for a loss function. It is a function that calculates how wrong a model's inference is compared to where it should be.

\( h \)

This symbol denotes a model in machine learning.

\( u \)

This symbol denotes the input of a model.

Derivation

  1. Consider the formula for expected risk:
    \[\htmlClass{sdt-0000000062}{R}(\htmlClass{sdt-0000000002}{\hat{f}}) = \htmlClass{sdt-0000000031}{E} [\htmlClass{sdt-0000000072}{L}(\htmlClass{sdt-0000000002}{\hat{f}}(\htmlClass{sdt-0000000013}{U}), \htmlClass{sdt-0000000021}{Y})]\]
    Instead of the optimal model \( \htmlClass{sdt-0000000002}{\hat{f}} \), we can calculate the risk of a model \( \htmlClass{sdt-0000000084}{h} \).
  2. Now recall the definitions of the random variables:

    The symbol \(U\) represents a random variable for the inputs of a problem. A random variable is a variable whose value is a numerical outcome of a random phenomenon. The inputs \( \htmlClass{sdt-0000000103}{u} \) to a model \( \htmlClass{sdt-0000000084}{h} \) are sampled from a probability distribution associated with \(U\).


    The symbol \(Y\) represents a random variable for the outputs of a problem. A random variable is a variable whose value is a numerical outcome of a random phenomenon. The ground truths \( \htmlClass{sdt-0000000037}{y} \) which a model \( \htmlClass{sdt-0000000084}{h} \) must predict are sampled from a probability distribution associated with \(Y\).

  3. Instead of the random variables \( \htmlClass{sdt-0000000013}{U} \) and \( \htmlClass{sdt-0000000021}{Y} \), we need to use samples \(\htmlClass{sdt-0000000103}{u}_i\) and \(\htmlClass{sdt-0000000037}{y}_i\) drawn from their respective distributions. Let's suppose that the total number of samples drawn is \(N\).
  4. Therefore, the new loss for each input and output pair will be:
    \[\htmlClass{sdt-0000000072}{L}(h(\htmlClass{sdt-0000000103}{u}_i), \htmlClass{sdt-0000000037}{y}_i)\]
  5. We see that the definition of the expectation is as follows:

    This symbol \(E\) represents the average value of an entire distribution associated with a random variable. It is difficult to determine empirically as we cannot access the underlying probability distribution function. Instead, we can sample values from the distribution and calculate the sample mean accordingly.

  6. For an empirical calculation, as seen above, we can use the sample mean of all the losses obtained for each input and output pair. We can calculate the sample mean by adding all the losses and then dividing by the number of losses \(N\). Thus, we obtain:
    \[\htmlClass{sdt-0000000062}{R}^{emp}(\htmlClass{sdt-0000000084}{h}) = \frac{1}{N} \sum^{N}_{i=1} L (\htmlClass{sdt-0000000084}{h}(\htmlClass{sdt-0000000103}{u}_i), \htmlClass{sdt-0000000037}{y}_i)\]
    as required.

Example

Suppose we have a regression task where we collected a dataset of 3 input and ground truth pairs as follows:

Now suppose we have model \( \htmlClass{sdt-0000000084}{h} \) whose predictions \( \htmlClass{sdt-0000000084}{h} \)(\( \htmlClass{sdt-0000000103}{u} \)) are as follows:

For this example, we will use the L2 loss. We obtain the following losses for each pair:

We can now plug these values into the equation for empirical risk:
\[\begin{align*} \htmlClass{sdt-0000000062}{R}^{emp}(\htmlClass{sdt-0000000084}{h}) &= \frac{0 + 1 + 0}{3}\\ \htmlClass{sdt-0000000062}{R}^{emp}(\htmlClass{sdt-0000000084}{h}) &= \frac{1}{3} \end{align*}\]
Therefore the answer is 1/3

References

  1. Jaeger, H. (n.d.). Neural Networks (AI) (WBAI028-05) Lecture Notes BSc program in Artificial Intelligence. Retrieved April 14, 2024, from https://www.ai.rug.nl/minds/uploads/LN_NN_RUG.pdf