Your History

Menu

Risk of Optimal Model

Prerequisites

Description

The risk of an optimal model describes an empirical method for determining the optimal model f^ \htmlClass{sdt-0000000002}{\hat{f}} for a given problem. It accomplishes this task by evaluating all candidate models h \htmlClass{sdt-0000000084}{h} from the hypothesis space H \htmlClass{sdt-0000000039}{\mathcal{H}} on a sampled dataset and selecting the model with the minimum risk.

f^=hopt=argminhH1Ni=1NL(h(ui),yi)\htmlClass{sdt-0000000002}{\hat{f}} = h_{opt} = \underset{h \in \htmlClass{sdt-0000000039}{\mathcal{H}}}{argmin} \hspace{0.2cm} \frac{1}{N} \sum^{N}_{i=1} L (\htmlClass{sdt-0000000084}{h}(\htmlClass{sdt-0000000103}{u}_i), \htmlClass{sdt-0000000037}{y}_i)

Symbols Used:

This symbol denotes the optimal model for a problem.

y y

This symbol stands for the ground truth of a sample. In supervised learning this is often paired with the corresponding input.

H \mathcal{H}

This is the symbol representing the set of possible models.

h h

This symbol denotes a model in machine learning.

u u

This symbol denotes the input of a model.

Derivation

  1. Recall the definition of the empirical risk of a model
    Remp(h)=1Ni=1NL(h(ui),yi)\htmlClass{sdt-0000000062}{R}^{emp}(\htmlClass{sdt-0000000084}{h}) = \frac{1}{N} \sum^{N}_{i=1} L (\htmlClass{sdt-0000000084}{h}(\htmlClass{sdt-0000000103}{u}_i), \htmlClass{sdt-0000000037}{y}_i)
  2. Now suppose that all our models h \htmlClass{sdt-0000000084}{h} are drawn from a hypothesis space H \htmlClass{sdt-0000000039}{\mathcal{H}} :

    The symbol H \mathcal{H} denotes the set of possible models, often from a particular class like "polynomials of any degree" or "multi-layer perceptron networks". For any learning algorithm, H \mathcal{H} indicates the space where an optimal model may be found.

  3. Using the definition of the optimal model f^ \htmlClass{sdt-0000000002}{\hat{f}} :

    The symbol f^\hat{f} denotes the optimal model for a problem. It yields the lowest risk R \htmlClass{sdt-0000000062}{R} for pairs of inputs and outputs. The goal of machine learning is to optimize h \htmlClass{sdt-0000000084}{h} until it becomes f^\hat{f}.


    We observe that we need to take the model h \htmlClass{sdt-0000000084}{h} with the lowest risk. This can be done using the argmin operator.
  4. Therefore, we obtain
    f^=argminhH1Ni=1NL(h(ui),yi)\htmlClass{sdt-0000000002}{\hat{f}} = \underset{h \in \htmlClass{sdt-0000000039}{\mathcal{H}}}{argmin} \hspace{0.2cm} \frac{1}{N} \sum^{N}_{i=1} L (\htmlClass{sdt-0000000084}{h}(\htmlClass{sdt-0000000103}{u}_i), \htmlClass{sdt-0000000037}{y}_i)
    as required

Example

Suppose, we have the following models with their empirical risk calculated on an arbitrary dataset of samples:

Remp(h1)=3Remp(h2)=2.3Remp(h3)=6\begin{align*} \htmlClass{sdt-0000000062}{R}^{emp}(\htmlClass{sdt-0000000084}{h}_1) &= 3 \\ \htmlClass{sdt-0000000062}{R}^{emp}(\htmlClass{sdt-0000000084}{h}_2) &= 2.3 \\ \htmlClass{sdt-0000000062}{R}^{emp}(\htmlClass{sdt-0000000084}{h}_3) &= 6 \end{align*}
Using the equation described above, we conclude observe that the optimal model f^ \htmlClass{sdt-0000000002}{\hat{f}} is the model X \htmlClass{sdt-0000000131}{X} h with the lowest risk.

Therefore, we obtain f^ \htmlClass{sdt-0000000002}{\hat{f}} = h2\htmlClass{sdt-0000000084}{h}_2.

References

  1. Jaeger, H. (n.d.). Neural Networks (AI) (WBAI028-05) Lecture Notes BSc program in Artificial Intelligence. Retrieved April 14, 2024, from https://www.ai.rug.nl/minds/uploads/LN_NN_RUG.pdf