The empirical risk of a model offers a definitive way to calculate the risk \( \htmlClass{sdt-0000000062}{R} \) using the sample mean of inputs and ground truths drawn from random variables. The empirical risk serves as an approximation of the expected risk, which is the true error the model would make over the entire distribution of possible input data. Due to practical limitations, we typically calculate empirical risk on a dataset of samples rather than the complete theoretical distribution.
\( y \) | This symbol stands for the ground truth of a sample. In supervised learning this is often paired with the corresponding input. |
\( R \) | This symbol denotes the risk of a model. |
\( L \) | This is the symbol for a loss function. It is a function that calculates how wrong a model's inference is compared to where it should be. |
\( h \) | This symbol denotes a model in machine learning. |
\( u \) | This symbol denotes the input of a model. |
The symbol \(U\) represents a random variable for the inputs of a problem. A random variable is a variable whose value is a numerical outcome of a random phenomenon. The inputs \( \htmlClass{sdt-0000000103}{u} \) to a model \( \htmlClass{sdt-0000000084}{h} \) are sampled from a probability distribution associated with \(U\).
The symbol \(Y\) represents a random variable for the outputs of a problem. A random variable is a variable whose value is a numerical outcome of a random phenomenon. The ground truths \( \htmlClass{sdt-0000000037}{y} \) which a model \( \htmlClass{sdt-0000000084}{h} \) must predict are sampled from a probability distribution associated with \(Y\).
This symbol \(E\) represents the average value of an entire distribution associated with a random variable. It is difficult to determine empirically as we cannot access the underlying probability distribution function. Instead, we can sample values from the distribution and calculate the sample mean accordingly.
Suppose we have a regression task where we collected a dataset of 3 input and ground truth pairs as follows:
Now suppose we have model \( \htmlClass{sdt-0000000084}{h} \) whose predictions \( \htmlClass{sdt-0000000084}{h} \)(\( \htmlClass{sdt-0000000103}{u} \)) are as follows:
For this example, we will use the L2 loss. We obtain the following losses for each pair:
We can now plug these values into the equation for empirical risk:
\[\begin{align*} \htmlClass{sdt-0000000062}{R}^{emp}(\htmlClass{sdt-0000000084}{h}) &= \frac{0 + 1 + 0}{3}\\ \htmlClass{sdt-0000000062}{R}^{emp}(\htmlClass{sdt-0000000084}{h}) &= \frac{1}{3} \end{align*}\]
Therefore the answer is 1/3