The risk of a model offers a way to quantify the error a model is likely to make when generating predictions for various input and output pairs. Models with low risk demonstrate a greater ability to make accurate predictions on new, unseen data. This indicates they have successfully learned the underlying patterns within the dataset. On the other hand, models with high risk struggle to grasp these patterns, leading to inaccurate predictions. The specific mathematical definition of risk varies, as it depends on the chosen loss function and the strategy used to sample data from the random variables representing the inputs and ground truths.
\( \hat{f} \) | This symbol denotes the optimal model for a problem. |
\( U \) | This symbol represents a random variable for the inputs of a problem. |
\( Y \) | This symbol represents a random variable for the outputs of a problem. |
\( E \) | This symbol represents the average value of a distribution associated with a random variable. |
\( R \) | This symbol denotes the risk of a model. |
\( L \) | This is the symbol for a loss function. It is a function that calculates how wrong a model's inference is compared to where it should be. |
The risk of a model is defined as the expected loss between the model's predictions and the ground truths:
The symbol \(\hat{f}\) denotes the optimal model for a problem. It yields the lowest risk \( \htmlClass{sdt-0000000062}{R} \) for pairs of inputs and outputs. The goal of machine learning is to optimize \( \htmlClass{sdt-0000000084}{h} \) until it becomes \(\hat{f}\).
The symbol \(U\) represents a random variable for the inputs of a problem. A random variable is a variable whose value is a numerical outcome of a random phenomenon. The inputs \( \htmlClass{sdt-0000000103}{u} \) to a model \( \htmlClass{sdt-0000000084}{h} \) are sampled from a probability distribution associated with \(U\).
The symbol \(Y\) represents a random variable for the outputs of a problem. A random variable is a variable whose value is a numerical outcome of a random phenomenon. The ground truths \( \htmlClass{sdt-0000000037}{y} \) which a model \( \htmlClass{sdt-0000000084}{h} \) must predict are sampled from a probability distribution associated with \(Y\).
The symbol for a loss function is \(L\). It is a function that calculates how wrong a model's inference is compared to where it should be. It typically represents some kind of "distance" between a predicted value and the "ground truth". Examples include Adam and Binary Cross Entropy (BCE).
This symbol \(E\) represents the average value of an entire distribution associated with a random variable. It is difficult to determine empirically as we cannot access the underlying probability distribution function. Instead, we can sample values from the distribution and calculate the sample mean accordingly.