We can write the gradient of the empirical risk as a sum of gradients. This is also exactly how the gradient is computed in practice. By doing a sweep through the training set (an 'epoch'), we can compute the gradient by aggregating the gradient of the loss function with respect to each training sample.
| \( \mathcal{N} \) | This is the symbol used for a function approximator, typically a neural network. |
| \( i \) | This is the symbol for an iterator, a variable that changes value to refer to a sequence of elements. |
| \( R \) | This symbol denotes the risk of a model. |
| \( \theta \) | This is the symbol we use for model weights/parameters. |
| \( \mathbf{y} \) | This symbol represents the output activation vector of a neural network. |
| \( L \) | This is the symbol for a loss function. It is a function that calculates how wrong a model's inference is compared to where it should be. |
| \( \nabla \) | This symbol represents the gradient of a function. |
| \( u \) | This symbol denotes the input of a model. |