Your History

Menu

Goal of Supervised Learning

Prerequisites

Risk of a Model | \(R(\hat{f}) = E [L(\hat{f}(U), Y)]\)
Optimal Model | \( \hat{f} \)
Hypothesis Space | \( \mathcal{H} \)
Model | \( h \)

Description

The goal of all supervised learning algorithms is to determine the optimal model \( \htmlClass{sdt-0000000002}{\hat{f}} \) for a problem which yields the lowest risk. They accomplish this task by evaluating all candidate models \( \htmlClass{sdt-0000000084}{h} \) from the hypothesis space \( \htmlClass{sdt-0000000039}{\mathcal{H}} \) and selecting the model with the minimum risk. This risk calculation considers a chosen loss function and how the model is expected to perform on unseen data, aiming to find the model that generalizes best.

\[\htmlClass{sdt-0000000002}{\hat{f}} = h_{opt} = \underset{h \in \htmlClass{sdt-0000000039}{\mathcal{H}}}{argmin} \hspace{0.2cm} \htmlClass{sdt-0000000031}{E}[\htmlClass{sdt-0000000072}{L}(\htmlClass{sdt-0000000002}{\hat{f}}(\htmlClass{sdt-0000000013}{U}), \htmlClass{sdt-0000000021}{Y})]\]

Symbols Used:

This symbol denotes the optimal model for a problem.

\( U \)

This symbol represents a random variable for the inputs of a problem.

\( Y \)

This symbol represents a random variable for the outputs of a problem.

\( E \)

This symbol represents the average value of a distribution associated with a random variable.

\( \mathcal{H} \)

This is the symbol representing the set of possible models.

\( L \)

This is the symbol for a loss function. It is a function that calculates how wrong a model's inference is compared to where it should be.

Derivation

  1. Recall the definition of risk of a model:
    \[\htmlClass{sdt-0000000062}{R}(\htmlClass{sdt-0000000002}{\hat{f}}) = \htmlClass{sdt-0000000031}{E} [\htmlClass{sdt-0000000072}{L}(\htmlClass{sdt-0000000002}{\hat{f}}(\htmlClass{sdt-0000000013}{U}), \htmlClass{sdt-0000000021}{Y})]\]
  2. Now consider the definition of the optimal model \( \htmlClass{sdt-0000000002}{\hat{f}} \).

    The symbol \(\hat{f}\) denotes the optimal model for a problem. It yields the lowest risk \( \htmlClass{sdt-0000000062}{R} \) for pairs of inputs and outputs. The goal of machine learning is to optimize \( \htmlClass{sdt-0000000084}{h} \) until it becomes \(\hat{f}\).

  3. We need to search a variety of models and select the one which yields the lowest risk. It might be wise to recall the definition of a model \( \htmlClass{sdt-0000000084}{h} \):

    The symbol for a model is \(h\). It represents a machine learning model that takes an input and gives an output.


    and the hypothesis space

    The symbol \( \mathcal{H} \) denotes the set of possible models, often from a particular class like "polynomials of any degree" or "multi-layer perceptron networks". For any learning algorithm, \( \mathcal{H} \) indicates the space where an optimal model may be found.

  4. Now we can use the \(argmin\) operator to find the best model
    \[\htmlClass{sdt-0000000002}{\hat{f}} = h_{opt} = \underset{h \in \htmlClass{sdt-0000000039}{\mathcal{H}}}{argmin} \hspace{0.2cm} \htmlClass{sdt-0000000031}{E}[\htmlClass{sdt-0000000072}{L}(\htmlClass{sdt-0000000002}{\hat{f}}(U), Y)]\]
    as required.

Example

See Purpose of Machine Learning for an empirical method to obtain the best model and Risk of Optimal Model for an example on how to use the \(argmin\) operator.

References

  1. Jaeger, H. (n.d.). Neural Networks (AI) (WBAI028-05) Lecture Notes BSc program in Artificial Intelligence. Retrieved April 28, 2024, from https://www.ai.rug.nl/minds/uploads/LN_NN_RUG.pdf