Your History

Menu

Risk of a Model

Prerequisites

Optimal Model | \( \hat{f} \)
Random Variable Input | \( U \)
Random Variable Output | \( Y \)
Expectation | \( E \)
Loss Function | \( L \)

Description

The risk of a model offers a way to quantify the error a model is likely to make when generating predictions for various input and output pairs. Models with low risk demonstrate a greater ability to make accurate predictions on new, unseen data. This indicates they have successfully learned the underlying patterns within the dataset. On the other hand, models with high risk struggle to grasp these patterns, leading to inaccurate predictions. The specific mathematical definition of risk varies, as it depends on the chosen loss function and the strategy used to sample data from the random variables representing the inputs and ground truths.

\[\htmlClass{sdt-0000000062}{R}(\htmlClass{sdt-0000000002}{\hat{f}}) = \htmlClass{sdt-0000000031}{E} [\htmlClass{sdt-0000000072}{L}(\htmlClass{sdt-0000000002}{\hat{f}}(\htmlClass{sdt-0000000013}{U}), \htmlClass{sdt-0000000021}{Y})]\]

Symbols Used:

This symbol denotes the optimal model for a problem.

\( U \)

This symbol represents a random variable for the inputs of a problem.

\( Y \)

This symbol represents a random variable for the outputs of a problem.

\( E \)

This symbol represents the average value of a distribution associated with a random variable.

\( R \)

This symbol denotes the risk of a model.

\( L \)

This is the symbol for a loss function. It is a function that calculates how wrong a model's inference is compared to where it should be.

Derivation

The risk of a model is defined as the expected loss between the model's predictions and the ground truths:

  1. Consider the definition of the optimal model \( \htmlClass{sdt-0000000002}{\hat{f}} \) for a problem:

    The symbol \(\hat{f}\) denotes the optimal model for a problem. It yields the lowest risk \( \htmlClass{sdt-0000000062}{R} \) for pairs of inputs and outputs. The goal of machine learning is to optimize \( \htmlClass{sdt-0000000084}{h} \) until it becomes \(\hat{f}\).


  2. The model operates on inputs sampled from a random variable \( \htmlClass{sdt-0000000013}{U} \):

    The symbol \(U\) represents a random variable for the inputs of a problem. A random variable is a variable whose value is a numerical outcome of a random phenomenon. The inputs \( \htmlClass{sdt-0000000103}{u} \) to a model \( \htmlClass{sdt-0000000084}{h} \) are sampled from a probability distribution associated with \(U\).


    The model generates predictions:
    \[\htmlClass{sdt-0000000002}{\hat{f}}(U)\]
  3. The predictions must be compared to the ground truths sampled from another random variable \( \htmlClass{sdt-0000000021}{Y} \):

    The symbol \(Y\) represents a random variable for the outputs of a problem. A random variable is a variable whose value is a numerical outcome of a random phenomenon. The ground truths \( \htmlClass{sdt-0000000037}{y} \) which a model \( \htmlClass{sdt-0000000084}{h} \) must predict are sampled from a probability distribution associated with \(Y\).

  4. Now consider the definition of loss:

    The symbol for a loss function is \(L\). It is a function that calculates how wrong a model's inference is compared to where it should be. It typically represents some kind of "distance" between a predicted value and the "ground truth". Examples include Adam and Binary Cross Entropy (BCE).


    We see that it is a function which generates an error based on the model predictions and ground truths:
    \[\htmlClass{sdt-0000000072}{L}(\htmlClass{sdt-0000000002}{\hat{f}}(\htmlClass{sdt-0000000013}{U}), \htmlClass{sdt-0000000021}{Y})\]
  5. Next, we see that the expectation is defined as:

    This symbol \(E\) represents the average value of an entire distribution associated with a random variable. It is difficult to determine empirically as we cannot access the underlying probability distribution function. Instead, we can sample values from the distribution and calculate the sample mean accordingly.


    Assuming, we have access to the underlying probability distribution function of the loss, we get:
    \[\htmlClass{sdt-0000000062}{R}(\htmlClass{sdt-0000000002}{\hat{f}}) = \htmlClass{sdt-0000000031}{E}[\htmlClass{sdt-0000000072}{L}(\htmlClass{sdt-0000000002}{\hat{f}}(\htmlClass{sdt-0000000013}{U}), \htmlClass{sdt-0000000021}{Y})]\]
    as required.

References

  1. Jaeger, H. (n.d.). Neural Networks (AI) (WBAI028-05) Lecture Notes BSc program in Artificial Intelligence. Retrieved April 28, 2024, from https://www.ai.rug.nl/minds/uploads/LN_NN_RUG.pdf