Loss Function

How do we know if our neural network is doing a good job? We need a score to measure its performance. A Loss Function (or Cost Function) quantifies the error between the network’s prediction and the actual target.

Different tasks require different loss functions. Here, $\hat{y}$ represents the predicted value (our activation output $a$ ) and $y$ represents the true target.

Mean Absolute Error (MAE): often used for regression when outliers shouldn’t be penalized too heavily.

L = \frac{1}{n} \sum_{i=1}^{n} |y^{(i)} - \hat{y}^{(i)}|

Cross-Entropy Loss: The standard for classification tasks like image classification and also used for predicting the next token in GPT models.

L = -\frac{1}{n} \sum_{i=1}^{n} \sum_{c=1}^{C} y_{c}^{(i)} \log(\hat{y}_{c}^{(i)})

Chat with Mike 3.0